RIMD, Osaka Univ. Osaka Univ.
MAFFT version 6

Multiple alignment program for amino acid or nucleotide sequences

Benchmark results (unfinished)

Benchmarks by third parties can be found in Nuin et al. 2006 (protein), Ahola et al. 2006 (protein) and Wilm et al. 2006 (RNA).

The following are results of five benchmark tests, BAliBASE, PREFAB, SABmark, IRMBase and a subset of HOMSTRAD, for protein alignment performed May 2005. MAFFT-L-INS-i and E-INS-i show the highest accuracy scores in currently available sequence alignment programs. However, the difference among MAFFT-L-INS-i, E-INS-i, TCoffee and ProbCons is quite small and not statistically significant in most cases. Thus TCoffee and ProbCons are worth trying.

BAliBASE version 3 (full-length sequences)

BAliBASE has recently been substantially updated. L-INS-i outperforms ProbCons and other methods for the full-length dataset, because the full-length dataset has non-homologous residues in N-/C- terminals and local algorithm works well in such a situation.

Method Ref11 Ref12 Ref2 Ref3 Ref4 Ref5 Overall average CPU time (s)
Consistency based methods
MAFFT 5.662 L-INS-i 67.11 / 44.61 93.62 / 83.73 92.57 / 45.17 85.58 / 56.83 91.91 / 59.47 90.15 / 58.44 87.05 / 58.64 5,500
MAFFT 5.662 E-INS-i 66.13 / 44.53 93.54 / 83.18 92.64 / 44.32 86.08 / 58.53 91.42 / 59.02 89.93 / 59.13 86.91 / 58.55 6,000
ProbCons 1.10 (default) 66.99 / 41.68 94.12 / 85.52 91.67 / 40.54 84.60 / 54.30 90.52 / 54.37 89.28 / 56.50 86.46 / 55.99 43,000
ProbCons 1.10 (trained) 66.73 / 41.47 94.13 / 85.38 91.85 / 42.00 84.47 / 54.03 89.79 / 51.94 89.34 / 57.69 86.27 / 55.71 (44,000)
MAFFT 5.662 G-INS-i 60.46 / 34.53 92.42 / 81.32 90.34 / 38.71 85.27 / 52.73 88.37 / 52.51 87.87 / 52.75 84.23 / 52.64 6,900
TCofee 2.46 61.48 / 33.63 93.04 / 82.36 91.71 / 39.68 81.61 / 48.87 89.22 / 52.90 89.03 / 57.13 84.56 / 52.76 (210,000)
Iterative refinement methods
MAFFT 5.662 FFT-NS-i 58.87 / 33.47 91.64 / 80.11 89.54 / 40.37 83.27 / 49.97 87.11 / 47.37 86.27 / 52.44 82.95* / 50.97* 2,800
Muscle 3.52 (most accurate option) 56.62 / 30.87 90.96 / 79.59 88.90 / 35.17 81.07 / 37.87 85.90 / 45.06 85.17 / 46.19 81.67** / 46.79* 3,400
PRRN 3.11 58.21 / 34.74 92.16 / 79.20 90.46 / 41.66 82.68 / 47.83 85.93 / 47.98 83.83 / 47.56 82.61* / 50.73* 250,000
MAFFT 3.89 † FFT-NS-i 54.56 / 30.26 90.78 / 78.61 90.12 / 37.46 82.65 / 49.33 87.83 / 50.76 85.65 / 49.31 82.16* / 50.27* 3,600
ClustalW 2.0 (Iteration=tree; Sep, 2007) 49.94 / 25.08 88.91 / 75.32 85.80 / 21.61 72.78 / 30.43 81.20 / 40.84 76.49 / 35.06 76.67** / 39.58** (58,000)
Progressive methods
Kalign 1.0 54.51 / 27.79 91.17 / 78.59 87.79 / 29.56 79.69 / 35.47 83.02 / 42.57 84.59 / 44.75 80.25** / 44.00** 480
MAFFT 5.662 FFT-NS-2 51.80 / 26.92 88.79 / 71.55 88.61 / 36.76 80.78 / 40.17 84.57 / 40.06 83.59 / 46.81 79.88** / 44.01** 250
MAFFT 5.662 FFT-NS-1 50.15 / 22.76 88.16 / 72.32 88.03 / 32.98 79.47 / 34.37 82.96 / 41.92 81.18 / 42.06 78.63** / 42.00** 140
Muscle 3.52 (fastest option) 53.36 / 26.97 88.79 / 72.32 86.39 / 29.37 77.74 / 32.93 79.38 / 34.47 76.59 / 35.56 77.63** / 39.71** 160
ClustalW 1.83 50.06 / 22.74 86.43 / 71.14 85.20 / 21.98 72.50 / 27.23 78.82 / 39.55 74.244/ 30.75 75.34** / 37.35** 2,000
The SP and TC scores are shown for each method. † Obsolete version of MAFFT. For iterative options of MAFFT and Muscle, the maximum numbers of iteration were set at 1,000. The significance of difference from the most accurate method is indicated by * (p<0.05) or ** (p<0.01) (Wilcoxon test) only for overall average.

BAliBASE version 3 (homologous regions only)

This dataset is similar to the old version of BAliBASE (v2). Global algorithms are suitable for globally alignable datasets like this. In particular, ProbCons outperforms other methods including the global option of MAFFT (G-INS-i).

Method Ref11 Ref12 Ref2 Ref3 Ref4 Ref5 Overall average CPU time (s)
ProbCons 1.10 81.03 / 63.08 95.04 / 87.07 95.75 / 60.44 90.68 / 65.17 - / - 90.91 / 60.53 90.89 / 68.86 15,000
MAFFT 5.667 G-INS-i 77.02 / 56.95 93.69 / 84.30 95.43 / 58.66 90.15 / 64.77 - / - 90.59 / 58.73 89.44 / 66.08 1,100
MAFFT 5.667 L-INS-i 75.20 / 55.34 93.88 / 84.43 94.81 / 53.85 89.20 / 63.57 - / - 90.05 / 59.33 88.70 / 64.42 970
MAFFT 5.667 E-INS-i 72.42 / 51.79 93.81 / 84.20 94.56 / 51.98 88.94 / 62.97 - / - 90.25 / 59.53 87.97* / 63.01* 1,400
MAFFT 5.667 FFT-NS-i 74.57 / 55.24 92.73 / 82.95 95.03 / 55.78 88.37 / 60.83 - / - 88.19 / 54.80 88.00** / 63.59* 300
Muscle 3.52 (most accurate option) 73.82 / 51.07 92.67 / 81.95 94.91 / 55.63 87.57 / 57.90 - / - 87.57 / 57.90 87.54** / 61.60** 360
The SP and TC scores are shown for each method. † Obsolete version of MAFFT. For iterative options of MAFFT and Muscle, the maximum numbers of iteration were set at 1,000. The significance of difference from the most accurate method is indicated by * (p<0.05) or ** (p<0.01) (Wilcoxon test) only for overall average.

PREFAB version 3

As expected, the difference among methods is large when similarity is low.

Method %id: 0-20 %id: 20-40 %id: 40-70 %id: 70-100 Overall CPU time (s)
MAFFT 5.662 L-INS-i 48.55 82.98 * 96.57 98.66 69.77 14,000
MAFFT 5.662 E-INS-i 48.61 83.00 * 95.71 98.66 69.75 * 19,000
ProbCons 1.10 47.58 83.43 96.14 98.10 69.43 210,000
MAFFT 5.662 G-INS-i 46.90 82.69 * 96.36 98.60 68.89 18,000
TCoffee 2.46 44.51 ** 81.84 ** 95.46 ** 98.63 67.42 **
MAFFT 5.662 FFT-NS-i 45.72 ** 81.00 ** 95.52 ** 98.67 67.65 ** 4,500
Muscle 3.52 (most accurate option) 43.06 ** 80.60 ** 94.90 ** 98.44 66.26 ** 8,600
MAFFT 3.89 † FFT-NS-i 42.14 ** 79.31 ** 95.31 ** 98.63 65.33 ** 6,900
ClustalW 1.83 33.96 ** 74.14 ** 93.54 ** 97.85 59.45 ** 13,000
The fraction of correctly aligned sites (%) is shown for each method. † Obsolete version of MAFFT. The difference from the new FFT-NS-i (v5.662) is only parameters (--jtt 200 --op 2.4 --ep 0.06 in v3.88, whereas --bl 62 --op 1.53 --ep 0.123 in v5.662). * p<0.05; * p<0.01 (Wilcoxon test)

HOMSTRAD (55 entries)

The advantage of MAFFT becomes larger when dozens or hundreds of sequences are simultaneously aligned. Outline of this test is (see the mafft5 paper for details): (1) A number (n) of homologues (n=20, 50, 100) were collected from SwissProt using BLAST (E < 10-10). (2) The homologues were aligned together with the original input sequences. (3) Then the homologues were removed. (4) The accuracy of an alignment was computed by comparing the reference alignment and the resulting alignment from which the homologs were removed.

Method n=0 n=20 n=50 n=100
ProbCons 1.10 46.65 50.17 51.07 51.62
MAFFT 5.662 L-INS-i 46.19 51.26 53.67 54.14
MAFFT 5.662 E-INS-i 44.85 50.16 52.77 53.33
MAFFT 5.662 G-INS-i 42.82 * 51.28 53.87 55.37
TCoffee 2.46 42.20 ** 47.39 ** 48.80 ** 48.50 **
MAFFT 5.662 FFT-NS-i 43.34 ** 49.49 50.65 ** 50.74 **
Muscle 3.52 (most accurate option) 43.14 ** 46.39 ** 47.05 ** 48.87 **
MAFFT 3.89 † FFT-NS-i 38.21 ** 44.26 ** 45.52 ** 42.85 **
ClustalW 1.83 36.77 ** 36.57 ** 37.33 ** 36.77 **
† Obsolete version of MAFFT. The difference from the new FFT-NS-i (v5.662) is only parameters (--jtt 200 --op 2.4 --ep 0.06 in v3.88, whereas --bl 62 --op 1.53 --ep 0.123 in v5.662).

SABmark version 1.65

The SABmark tests shows similar results to HOMSTRAD. However, two parameters (--op and --ep) of MAFFT were determined by applying the FFT-NS-2 algorithm to a subset of SABmark and the training set is included in this analysis. So this might be biased to MAFFT. L-INS-i and E-INS-i have four and six more parameters, respectively. These additional parameters have not yet closely tuned. Accuracy values of L-INS-i can be further increased. This parameter set is not yet adopted as default.

Method n=0 n=20 n=50
MAFFT 5.662 L-INS-i 44.62 (19.30) 48.79 (22.04) 49.65 (22.29)
MAFFT 5.662 E-INS-i 42.53 (17.74) 45.99 (19.66) 48.94 (21.64)
MAFFT 5.662 G-INS-i 42.25 (18.38) 47.69 (21.88) 51.47 (23.53)
TCoffee 2.46 42.48 (17.74) 42.93 (17.45) 44.12 (17.41)
ProbCons 1.10 43.03 (17.28) 42.34 (16.82) 42.22 (16.74)
MAFFT 5.662 FFT-NS-i 37.11 (15.28) 40.13 (17.42) 44.48 (18.61)
Muscle 3.52 (most accurate option) 36.53 (14.39) 36.74 (15.55) 38.33 (16.41)
MAFFT 3.89 † FFT-NS-i 30.59 (11.45) 34.01 (13.52) 34.68 (14.62)
† Obsolete version of MAFFT. The difference from the new FFT-NS-i (v5.662) is only parameters (--jtt 200 --op 2.4 --ep 0.06 in v3.88, whereas --bl 62 --op 1.53 --ep 0.123 in v5.662). The fD values for SABmark tests

OXBench

For the master subset of OXBench, MAFFT-L-INS-i is less accurate than Muscle, ProbCons and ClustalW. The difference between MAFFT and other methods is statistically significant. In contrast, L-INS-i significantly outperforms other methods in the extended subset of OXBench.

Method master (∼5.7 seqs) extended (∼99 seqs) full (∼6.0 seqs)
Muscle 3.52 (most accurate option) 84.37 / 7.306 86.76 ** / 7.339 ** 74.22 ** / 6.750
ProbCons 1.10 84.12 / 7.261 ** 87.79 ** / 7.108 ** 75.14 / 6.753
MAFFT 5.662 L-INS-i 83.01 ** / 7.226 ** 88.80 / 7.409 74.75 ** / 6.761
MAFFT 3.89 FFT-NS-i 82.77 ** / 7.210 ** 86.09 ** / 7.302 ** 73.83 ** / 6.677 **
ClustalW 1.83 83.80 / 7.300 83.29 ** / 7.178 ** 72.68 ** / 6.691
The column / stamp scores are shown for each method. † Obsolete version of MAFFT. The difference from the new FFT-NS-i (v5.662) is only parameters (--jtt 200 --op 2.4 --ep 0.06 in v3.88, whereas --bl 62 --op 1.53 --ep 0.123 in v5.662).

IRMBase

IRMBase is a benchmark for entirely different situations from BAliBASE. The each reference alignment of IRMBase has short conserved motif(s) embedded in long unalignable residues. The number of motif(s) is 1, 2 or 3 in ref1, ref2 or ref3, respectively. In such situation, DIALIGN-T and E-INS-i work well. For ref3, E-INS-i works well because of generalized affine gap costs. E-INS-i also works well for ref1 because of the parametes (--ep 0 in E-INS-i), not because of the algorithm.

Method Ref1 Ref2 Ref3
DIALIGN-T 83.02 80.86
MAFFT 5.662 E-INS-i 87.64 80.42
MAFFT 5.662 L-INS-i 76.50 63.07
TCoffee 2.46 68.08 65.43
ProbCons 1.10 51.80 60.65
MAFFT 5.662 FFT-NS-i 34.60 47.86
Muscle 3.52 (fastest option) 34.60 45.99
Muscle 3.52 (most accurate option) 10.88 25.27
MAFFT 3.89 † FFT-NS-i 58.20 68.41
MAFFT 5.662 G-INS-i 44.83 32.95
ClustalW 1.83 0 5.296
The fractions of correctly aligned sites (%) are shown.