MAFFT versus Clustal - how they align after making a guide tree

someone here

wrote...

Posts: 8

Rep:

5 years ago

MAFFT versus Clustal - how they align after making a guide tree

Hi!
I am aware that MAFFT uses the fast Fourier Transformation (FFT) as an algorithm. I think Clustal uses Needleman Wunsch? What I am most unclear about is this:
Both make guide trees based on distances calculated from the frequency of k-mers in common - what happens next? Is that where these algorithms kick in?
How does FFT do this with the guide tree? I don't mean a mathematical explanation (that would utterly lose me). I understand a rudimentary outline of what it does in taking the polarities and volumes of amino acids and then grouping them based on their similarities, assuming that sequences are homologous if these aspects are close enough.
But how does that involve the tree? Does it do this for branches on the tree? I really get confused as of this point, because both programs seem to use matrices like BLOSUM62 - how does that play into it? I thought this was for sum of pairs or weighted sum of pairs in iterative methods and not progressive methods?
I have tried reading articles and presentations, but I get varied impressions - I have yet to find anything that specifically states the exact order and what exactly happens in each step. Sometimes people mention FFT first, or NW, then the guide tree or the other way around. The original publications are so math heavy that they throw me off pretty quickly. I have yet to find any comprehensive guide that directly compares say FFT NS-1 for MAFFT and ClustalW in step form and explains the details in a way that is not for experts by experts.

Can anyone help me? What I really need is a "better explained" or "for dummies" break down of this.

Read 340 times

9 Replies

Report

Related Topics

Replies

bio_man

wrote...

Educator

5 years ago

Quote

I am aware that MAFFT uses the fast Fourier Transformation (FFT) as an algorithm.

https://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html This link describes the agorithm used -- well beyond my current scope of knowledge, though I'm certain it'd make more sense to an active biologist.

I've uploaded a powerpoint that discusses sequence alignments methods, including Clustal. It can be accessed either in the link below or the attachment underneath. Generally, clustal takes as input nucleic acid or protein sequences that have a reasonable degree of similarity over their whole length and produces as output a multiple sequence alignment.

https://biology-forums.com/index.php?action=downloads;sa=view;down=12435

As mentioned in the document, because Clustal uses global alignments, it works best with sequences that are close to the same size, have the same domain architecture, and have similarity through most of their lengths (sources below).

It's unfortunate that my knowledge in bioinformatics is limited, but it's an excellent example of mathematics and biology working hand-in-hand to make research more efficient. Eye opener.

Let me know if there's anything else I can help you with

Attached file

Profiles.ppt (318.5 KB)

You must login or register to gain access to this attachment.

Report

someone h. Author

wrote...

5 years ago

I was unable to look at the powerpoint, but did see your other links. I have already looked at the MAFFT site extensively - it and its associated papers are the source of my confusion.
I think I get it for ClustalW:
Initially there is a pairwise alignment performed using the NW algorithm, these alignments are then scored for similarity, which is used as the distance that produces the guide tree. From the guide tree onward, a progressive alignment is formed.

But MAFFT? Neutral Face

The original paper suggests that everything is the same as ClustalW, only with FFT used instead of NW.

Then comes their website which says the following:

That the distance matrix is NOT found using FFT, but using 6-mers (k-mer counting)!!!!! And FFT is used to construct the progressive alignment? Neutral Face

!!!

Now, I found the paper which introduces the 6-mer distance matrix component, but according to this paper, it was added as a second algorithm called PartTree.

I am stumped.

Report

bio_man

wrote...

Educator

5 years ago

Quote from: someone here (5 years ago)

Initially there is a pairwise alignment performed using the NW algorithm, these alignments are then scored for similarity, which is used as the distance that produces the guide tree. From the guide tree onward, a progressive alignment is formed.

This part is correct as per my research, according to the text found here (the abridged version is provided below, mentioning the use of weights): http://compbio.pbworks.com/w/page/16252909/Multiple%20Sequence%20Alignment

The CLUSTAL algorithm is widely used profile-based progressive alignment algorithm. The CLUSTAL method relies on the fact that the similarity between sequences is probable to evolutionary. CLUSTALW (Higgins D. et al, 1994) succeeded the CLUSTEL version by adding the 'W' standing for weighted. This represents the program's ability to provide weights to the sequences and to the program parameters. CLUSTALW works using the same progressive alignment algorithm described before, except for its carefully tuned use of the profile alignment method. The initial alignments (first step of the progressive algorithm) used to produce the guide tree may be obtained by either fast k-tuple or pattern-finding approach that is similar to FASTA, or by the slower dynamic programming method.

You're right about the ambiguity related to MAFFT, the information is all over the place. One source mentions the following:

For amino acid alignment, MAFFT uses the BLOSUM62 matrix by default. For nucleotide alignment, a 200PAM log-odds scoring matrix is generated assuming that the transition rate is twice the transversion rate. These matrices are suitable for aligning distantly related sequences. We selected these default parameters based on an expectation that, if the program works well for difficult (distantly related) cases, it should also work well for easy cases.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3603318

Perhaps you can decode what's being presented here better than I can

Report

someone h. Author

wrote...

5 years ago

Thank you for trying to understand all this crazy stuff!
I wrote the creators of MAFFT and they sent me this explanation, which definitely sheds some light on how these things work.

Attached file

Thumbnail(s):

You must login or register to gain access to this attachment.

Report

bio_man

wrote...

Educator

5 years ago

Man, that's cool, you've potentially got the programmer explaining how it works! Thx for updating

Report

someone h. Author

wrote...

5 years ago

To update even further, the G-INS-i doesn't actually use NW in its pairwise -alignment step (that would be the first step), but fast Fourier Transformation (FFT). The developer corrected this in a later conversation, but also indicated that the results are practically the same with either algorithm.
I still had a few more questions for him and am excited to hear his answers, which I will post here, perhaps with a final outline summary. On the faint chance that someone else needs to know all of this in the future, I hope this thread helps a little.
It can really be challenging to weed through some of the literature, the misinterpretations of said literature and all the other crazy inconsistencies that arise in information when you are completely new to a subject and don't have the necessary background to make sense of any of it.

Report

bio_man

wrote...

Educator

5 years ago Edited: 5 years ago, bio_man

Quote from: someone here (5 years ago)

On the faint chance that someone else needs to know all of this in the future, I hope this thread helps a little.

I had the same thought in mind, looking forward to the update

Report

someone h. Author

wrote...

5 years ago

So, this what the developer was kindly able to provide me with concerning the exact order of the methods L-INSi and 1 (the local aligner) and G-INS-i and 1 (the global aligner).

So the information given in the earlier screenshot in regards to everything except these methods was correct, only these needed updating.

Attached file

Thumbnail(s):

You must login or register to gain access to this attachment.

Report

bio_man

wrote...

Educator

5 years ago

I feel like I've just read classified information Smiling Face with Open Mouth

Any other questions, let us know

Report

New Topic

Home Search Q & A Board Gallery Resource Library Blog Dictionary Chat	Quick Links Start a Topic Latest Topics Unanswered Questions Courses Study Tools Browse by Textbook	Social Facebook YouTube Community Stats Who's Online Trending Staff	Help About Us Take the Tour Study Tips Posting Guidelines Terms and Policies Contact Us
Biology Forums - Study Force © 2010-2024 \| Powered by SMF \| SMF © 2015, Simple Machines \| Sitemap Biology-Forums.com is not affiliated with any publisher. Book covers, title and author names appear for reference only.