% mafft in > out
If this abnormally terminates or or you have extremely many (>10,000) sequenecs to be aligned, try to manually select an appropriate combination of the following options.
| Argument | Default | |
|---|---|---|
| --retree 1 | Approxmately two times faster but more rough than default | --retree 2 |
| --maxiterate 2 | Enhances the accuracy but not applicable to many sequences | --maxiterate 0 |
| --memsave | Memory saving but approximately two times slower | auto |
| --fft | For long (∼1,000,000 nt) conserved sequences | auto |
| --nofft | For many (∼5,000) sequences | auto |
| --parttree | For extremely many (>10,000) sequences | disabled |
| --dpparttree | For extremely many (>10,000) sequences | disabled |
| --fastaparttree | For extremely many (>10,000) sequences | disabled |
| --partsize 1000 | More accurate than default | --partsize 50 |
| --groupsize 1 | Does not align. Recommended to be used with --reorder.
The sequences will be sorted according to similarity. |
--groupsize (large) |
| --treeout | Outputs the guide tree | disabled |
Bug information:
Version 5.830 (2006/04/24) crashes when a long (>32,767) gap is
being inserted.
Please update to v5.850 or higher.
Bug information:
Versions 6.619 - 6.704 have a problem with this feature.
Please update to v6.705 or higher (2009/05/17).
MAFFT requires memory space proportional to L2 by default, where L is sequence length. When the --memsave option is added or alignment length exceeds a threshold, however, a linear-space DP algorithm similar to Myers & Miller (1988) is used. It is not yet tested whether the use of this algorithm sacrifices the accuracy of resulting alignment or not. Moreover, it is approximately two times slower than a normal DP. If you have a huge RAM, add --nomemsave to always apply a normal DP (versions ≥6.620 only).
When the similarity among input sequences is high and the number of sequences N is small (up to ∼100), the FFT approximation is highly recommended to reduce the CPU time of the DP process from O(L2) to O(L).
% mafft --fft --(no)memsave in > outTime complexity: O(NL)+O(N3) (when input sequences are highly conserved) to O(NL2)+O(N3) (when the similarity among input sequences is weak)
The re-estimation of guide tree can be disabled by --retree 1, by which the accuracy is reduced while the speed is approximately doubled, in comparison with the default.
% mafft --fft --(no)memsave --retree 1 in > out
Iterative refinement can be applied to improve the accuracy only when the similarity is high.
% mafft --fft --(no)memsave --maxiterate 2 in > out
Note that MAFFT is applicable only to globally homologous input sequences. If the sequences have repeat or inversion, use other tools such as FASTA and MUMmer.
The re-estimation of guide tree can be disabled by --retree 1, by which the accuracy is reduced while the speed is approximately doubled, in comparison with the default.
% mafft --retree 1 in > outTime complexity: O(NL2)+O(N3)
A key technique for handling many sequences is the 3 mer- or 6 mer-based algorithm to roughly estimate a pairwise distance (Higgins & Sharp 1988; Jones et al. 1992; Katoh et al. 2002). Another program package MUSCLE (Edgar 2004) adopted the same algorithm. MUSCLE is worth trying because it has a more efficient UPGMA routine than that of MAFFT.
% mafft --(no)memsave --retree 1 in > outTime complexity: O(NL2)+O(N3)
% mafft --parttree --retree 1 in > out
% mafft --parttree --retree 2 in > out
% mafft --parttree --retree 2 --partsize 1000 in > out
% mafft --fastaparttree --retree 2 --partsize 1000 in > outThe above options are tested using only a small number of examples. Please send a bug report to the author if you have any trouble in using these options.