Initially there is a pairwise alignment performed using the NW algorithm, these alignments are then scored for similarity, which is used as the distance that produces the guide tree. From the guide tree onward, a progressive alignment is formed.
This part is correct as per my research, according to the text found here (the abridged version is provided below, mentioning the use of weights):
http://compbio.pbworks.com/w/page/16252909/Multiple%20Sequence%20AlignmentThe CLUSTAL algorithm is widely used profile-based progressive alignment algorithm. The CLUSTAL method relies on the fact that the similarity between sequences is probable to evolutionary. CLUSTALW (Higgins D. et al, 1994) succeeded the CLUSTEL version by adding the 'W' standing for weighted. This represents the program's ability to provide weights to the sequences and to the program parameters. CLUSTALW works using the same progressive alignment algorithm described before, except for its carefully tuned use of the profile alignment method. The initial alignments (first step of the progressive algorithm) used to produce the guide tree may be obtained by either fast k-tuple or pattern-finding approach that is similar to FASTA, or by the slower dynamic programming method.
You're right about the ambiguity related to MAFFT, the information is all over the place. One source mentions the following:
For amino acid alignment, MAFFT uses the BLOSUM62 matrix by default. For nucleotide alignment, a 200PAM log-odds scoring matrix is generated assuming that the transition rate is twice the transversion rate. These matrices are suitable for aligning distantly related sequences. We selected these default parameters based on an expectation that, if the program works well for difficult (distantly related) cases, it should also work well for easy cases.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3603318Perhaps you can decode what's being presented here better than I can