mafft - a multiple sequence alignment program

Benchmark results (unfinished)

Benchmarks by third parties can be found in Nuin et al. 2006 (protein), Ahola et al. 2006 (protein) and Wilm et al. 2006 (RNA).

The following are results of five benchmark tests, BAliBASE, PREFAB, SABmark, IRMBase and a subset of HOMSTRAD, for protein alignment performed May 2005. MAFFT-L-INS-i and E-INS-i show the highest accuracy scores in currently available sequence alignment programs. However, the difference among MAFFT-L-INS-i, E-INS-i, TCoffee and ProbCons is quite small and not statistically significant in most cases. Thus TCoffee and ProbCons are worth trying.

BAliBASE version 3 (full-length sequences)

BAliBASE has recently been substantially updated. L-INS-i outperforms ProbCons and other methods for the full-length dataset, because the full-length dataset has non-homologous residues in N-/C- terminals and local algorithm works well in such a situation.

Method	Ref11	Ref12	Ref2	Ref3	Ref4	Ref5	Overall average	CPU time (s)
Consistency based methods
MAFFT 5.662 L-INS-i	67.11 / 44.61	93.62 / 83.73	92.57 / 45.17	85.58 / 56.83	91.91 / 59.47	90.15 / 58.44	87.05 / 58.64	5,500
MAFFT 5.662 E-INS-i	66.13 / 44.53	93.54 / 83.18	92.64 / 44.32	86.08 / 58.53	91.42 / 59.02	89.93 / 59.13	86.91 / 58.55	6,000
ProbCons 1.10 (default)	66.99 / 41.68	94.12 / 85.52	91.67 / 40.54	84.60 / 54.30	90.52 / 54.37	89.28 / 56.50	86.46 / 55.99	43,000
ProbCons 1.10 (trained)	66.73 / 41.47	94.13 / 85.38	91.85 / 42.00	84.47 / 54.03	89.79 / 51.94	89.34 / 57.69	86.27 / 55.71	(44,000)
MAFFT 5.662 G-INS-i	60.46 / 34.53	92.42 / 81.32	90.34 / 38.71	85.27 / 52.73	88.37 / 52.51	87.87 / 52.75	84.23 / 52.64	6,900
TCofee 2.46	61.48 / 33.63	93.04 / 82.36	91.71 / 39.68	81.61 / 48.87	89.22 / 52.90	89.03 / 57.13	84.56 / 52.76	(210,000)
*Iterative refinement methods*
MAFFT 5.662 FFT-NS-i	58.87 / 33.47	91.64 / 80.11	89.54 / 40.37	83.27 / 49.97	87.11 / 47.37	86.27 / 52.44	82.95* / 50.97*	2,800
Muscle 3.52 (most accurate option)	56.62 / 30.87	90.96 / 79.59	88.90 / 35.17	81.07 / 37.87	85.90 / 45.06	85.17 / 46.19	81.67** / 46.79*	3,400
PRRN 3.11	58.21 / 34.74	92.16 / 79.20	90.46 / 41.66	82.68 / 47.83	85.93 / 47.98	83.83 / 47.56	82.61* / 50.73*	250,000
MAFFT 3.89 † FFT-NS-i	54.56 / 30.26	90.78 / 78.61	90.12 / 37.46	82.65 / 49.33	87.83 / 50.76	85.65 / 49.31	82.16* / 50.27*	3,600
ClustalW 2.0 (Iteration=tree; Sep, 2007)	49.94 / 25.08	88.91 / 75.32	85.80 / 21.61	72.78 / 30.43	81.20 / 40.84	76.49 / 35.06	76.67 / 39.58	(58,000)
*Progressive methods*
Kalign 1.0	54.51 / 27.79	91.17 / 78.59	87.79 / 29.56	79.69 / 35.47	83.02 / 42.57	84.59 / 44.75	80.25 / 44.00	480
MAFFT 5.662 FFT-NS-2	51.80 / 26.92	88.79 / 71.55	88.61 / 36.76	80.78 / 40.17	84.57 / 40.06	83.59 / 46.81	79.88 / 44.01	250
MAFFT 5.662 FFT-NS-1	50.15 / 22.76	88.16 / 72.32	88.03 / 32.98	79.47 / 34.37	82.96 / 41.92	81.18 / 42.06	78.63 / 42.00	140
Muscle 3.52 (fastest option)	53.36 / 26.97	88.79 / 72.32	86.39 / 29.37	77.74 / 32.93	79.38 / 34.47	76.59 / 35.56	77.63 / 39.71	160
ClustalW 1.83	50.06 / 22.74	86.43 / 71.14	85.20 / 21.98	72.50 / 27.23	78.82 / 39.55	74.244/ 30.75	75.34 / 37.35	2,000

The SP and TC scores are shown for each method. † Obsolete version of MAFFT. For iterative options of MAFFT and Muscle, the maximum numbers of iteration were set at 1,000. The significance of difference from the most accurate method is indicated by * (p<0.05) or ** (p<0.01) (Wilcoxon test) only for overall average.

BAliBASE version 3 (homologous regions only)

This dataset is similar to the old version of BAliBASE (v2). Global algorithms are suitable for globally alignable datasets like this. In particular, ProbCons outperforms other methods including the global option of MAFFT (G-INS-i).

Method	Ref11	Ref12	Ref2	Ref3	Ref4	Ref5	Overall average	CPU time (s)
ProbCons 1.10	81.03 / 63.08	95.04 / 87.07	95.75 / 60.44	90.68 / 65.17	- / -	90.91 / 60.53	90.89 / 68.86	15,000
MAFFT 5.667 G-INS-i	77.02 / 56.95	93.69 / 84.30	95.43 / 58.66	90.15 / 64.77	- / -	90.59 / 58.73	89.44 / 66.08	1,100
MAFFT 5.667 L-INS-i	75.20 / 55.34	93.88 / 84.43	94.81 / 53.85	89.20 / 63.57	- / -	90.05 / 59.33	88.70 / 64.42	970
MAFFT 5.667 E-INS-i	72.42 / 51.79	93.81 / 84.20	94.56 / 51.98	88.94 / 62.97	- / -	90.25 / 59.53	87.97* / 63.01*	1,400
MAFFT 5.667 FFT-NS-i	74.57 / 55.24	92.73 / 82.95	95.03 / 55.78	88.37 / 60.83	- / -	88.19 / 54.80	88.00** / 63.59*	300
Muscle 3.52 (most accurate option)	73.82 / 51.07	92.67 / 81.95	94.91 / 55.63	87.57 / 57.90	- / -	87.57 / 57.90	87.54 / 61.60	360

PREFAB version 3

As expected, the difference among methods is large when similarity is low.

Method	%id: 0-20	%id: 20-40	%id: 40-70	%id: 70-100	Overall	CPU time (s)
MAFFT 5.662 L-INS-i	48.55	82.98 *	96.57	98.66	69.77	14,000
MAFFT 5.662 E-INS-i	48.61	83.00 *	95.71	98.66	69.75 *	19,000
ProbCons 1.10	47.58	83.43	96.14	98.10	69.43	210,000
MAFFT 5.662 G-INS-i	46.90	82.69 *	96.36	98.60	68.89	18,000
TCoffee 2.46	44.51 **	81.84 **	95.46 **	98.63	67.42 **
MAFFT 5.662 FFT-NS-i	45.72 **	81.00 **	95.52 **	98.67	67.65 **	4,500
Muscle 3.52 (most accurate option)	43.06 **	80.60 **	94.90 **	98.44	66.26 **	8,600
MAFFT 3.89 † FFT-NS-i	42.14 **	79.31 **	95.31 **	98.63	65.33 **	6,900
ClustalW 1.83	33.96 **	74.14 **	93.54 **	97.85	59.45 **	13,000

The fraction of correctly aligned sites (%) is shown for each method. † Obsolete version of MAFFT. The difference from the new FFT-NS-i (v5.662) is only parameters (--jtt 200 --op 2.4 --ep 0.06 in v3.88, whereas --bl 62 --op 1.53 --ep 0.123 in v5.662). * p<0.05; * p<0.01 (Wilcoxon test)

HOMSTRAD (55 entries)

The advantage of MAFFT becomes larger when dozens or hundreds of sequences are simultaneously aligned. Outline of this test is (see the mafft5 paper for details): (1) A number (n) of homologues (n=20, 50, 100) were collected from SwissProt using BLAST (E < 10^-10). (2) The homologues were aligned together with the original input sequences. (3) Then the homologues were removed. (4) The accuracy of an alignment was computed by comparing the reference alignment and the resulting alignment from which the homologs were removed.

Method	n=0	n=20	n=50	n=100
ProbCons 1.10	46.65	50.17	51.07	51.62
MAFFT 5.662 L-INS-i	46.19	51.26	53.67	54.14
MAFFT 5.662 E-INS-i	44.85	50.16	52.77	53.33
MAFFT 5.662 G-INS-i	42.82 *	51.28	53.87	55.37
TCoffee 2.46	42.20 **	47.39 **	48.80 **	48.50 **
MAFFT 5.662 FFT-NS-i	43.34 **	49.49	50.65 **	50.74 **
Muscle 3.52 (most accurate option)	43.14 **	46.39 **	47.05 **	48.87 **
MAFFT 3.89 † FFT-NS-i	38.21 **	44.26 **	45.52 **	42.85 **
ClustalW 1.83	36.77 **	36.57 **	37.33 **	36.77 **

† Obsolete version of MAFFT. The difference from the new FFT-NS-i (v5.662) is only parameters (--jtt 200 --op 2.4 --ep 0.06 in v3.88, whereas --bl 62 --op 1.53 --ep 0.123 in v5.662).

SABmark version 1.65

The SABmark tests shows similar results to HOMSTRAD. However, two parameters (--op and --ep) of MAFFT were determined by applying the FFT-NS-2 algorithm to a subset of SABmark and the training set is included in this analysis. So this might be biased to MAFFT. L-INS-i and E-INS-i have four and six more parameters, respectively. These additional parameters have not yet closely tuned. Accuracy values of L-INS-i can be further increased. This parameter set is not yet adopted as default.

Method	n=0	n=20	n=50
MAFFT 5.662 L-INS-i	44.62 (19.30)	48.79 (22.04)	49.65 (22.29)
MAFFT 5.662 E-INS-i	42.53 (17.74)	45.99 (19.66)	48.94 (21.64)
MAFFT 5.662 G-INS-i	42.25 (18.38)	47.69 (21.88)	51.47 (23.53)
TCoffee 2.46	42.48 (17.74)	42.93 (17.45)	44.12 (17.41)
ProbCons 1.10	43.03 (17.28)	42.34 (16.82)	42.22 (16.74)
MAFFT 5.662 FFT-NS-i	37.11 (15.28)	40.13 (17.42)	44.48 (18.61)
Muscle 3.52 (most accurate option)	36.53 (14.39)	36.74 (15.55)	38.33 (16.41)
MAFFT 3.89 † FFT-NS-i	30.59 (11.45)	34.01 (13.52)	34.68 (14.62)

† Obsolete version of MAFFT. The difference from the new FFT-NS-i (v5.662) is only parameters (--jtt 200 --op 2.4 --ep 0.06 in v3.88, whereas --bl 62 --op 1.53 --ep 0.123 in v5.662). The f_D values for SABmark tests

OXBench

For the master subset of OXBench, MAFFT-L-INS-i is less accurate than Muscle, ProbCons and ClustalW. The difference between MAFFT and other methods is statistically significant. In contrast, L-INS-i significantly outperforms other methods in the extended subset of OXBench.

Method	master (∼5.7 seqs)	extended (∼99 seqs)	full (∼6.0 seqs)
Muscle 3.52 (most accurate option)	84.37 / 7.306	86.76 / 7.339	74.22 ** / 6.750
ProbCons 1.10	84.12 / 7.261 **	87.79 / 7.108	75.14 / 6.753
MAFFT 5.662 L-INS-i	83.01 / 7.226	88.80 / 7.409	74.75 / 6.761**
MAFFT 3.89 FFT-NS-i	82.77 / 7.210	86.09 / 7.302	73.83 / 6.677
ClustalW 1.83	83.80 / 7.300	83.29 / 7.178	72.68 ** / 6.691

The column / stamp scores are shown for each method. † Obsolete version of MAFFT. The difference from the new FFT-NS-i (v5.662) is only parameters (--jtt 200 --op 2.4 --ep 0.06 in v3.88, whereas --bl 62 --op 1.53 --ep 0.123 in v5.662).

IRMBase

IRMBase is a benchmark for entirely different situations from BAliBASE. The each reference alignment of IRMBase has short conserved motif(s) embedded in long unalignable residues. The number of motif(s) is 1, 2 or 3 in ref1, ref2 or ref3, respectively. In such situation, DIALIGN-T and E-INS-i work well. For ref3, E-INS-i works well because of generalized affine gap costs. E-INS-i also works well for ref1 because of the parametes (--ep 0 in E-INS-i), not because of the algorithm.

Method	Ref1	Ref2	Ref3
DIALIGN-T	83.02		80.86
MAFFT 5.662 E-INS-i	87.64		80.42
MAFFT 5.662 L-INS-i	76.50		63.07
TCoffee 2.46	68.08		65.43
ProbCons 1.10	51.80		60.65
MAFFT 5.662 FFT-NS-i	34.60		47.86
Muscle 3.52 (fastest option)	34.60		45.99
Muscle 3.52 (most accurate option)	10.88		25.27
MAFFT 3.89 † FFT-NS-i	58.20		68.41
MAFFT 5.662 G-INS-i	44.83		32.95
ClustalW 1.83	0		5.296

The fractions of correctly aligned sites (%) are shown.

Multiple alignment program for amino acid or nucleotide sequences