MAFFT version 7

Multiple alignment program for amino acid or nucleotide sequences

(not yet completed, 2016/Aug)

Available in versions ≥7.294.


This script partially applies the iterative refinement technique to a large number of short sequences (more than ∼10,000 sequences × less than ∼5,000 sites excl. gaps; assumed to be highly conserved).  (0) A small number of core sequences are randomly picked up and (1) aligned by an iterative refinement option.  Then (2) the remaining sequences are progressively added to the core alignment, with the --add option.  See Yamada et al. (2016) for details. 
% mafft-sparsecore.rb -i in > out
The Ruby interpreter is needed to run this script on command line.  Online service is available here

(0) Selection of core sequences

Two parameters, p and n control how core sequences are selected.
% mafft-sparsecore.rb -i in -p p -n n% > out

If specific sequences should be included in the core alignment, then mark them with ">_focus_" at their title lines in the input file (experimental feature).

>_focus_ Important sequence
>_focus_ Important sequence
> Less important sequence

(1) Iterative refinement stage

The G-INS-i strategy is used by default, for the iterative refinement calculation.  Options for this stage can be specified using -C.  To restrict the number of iterations to two,
% mafft-sparsecore.rb -i in -C '--maxiterate 2' > out

To use the L-INS-i strategy (uses local pairwise alignments),

% mafft-sparsecore.rb -i in -C '--localpair' > out

(2) Progressive stage

Options for this stage can be specified using -A.  To use a memory-saving tree (for more than 100,000 sequences),
% mafft-sparsecore.rb -i in -A '--memsavetree' > out

To omit the recalculation of guide tree,

% mafft-sparsecore.rb -i in -A '--retree 1' > out
(faster and less accurate than the default)

Other options

By default, the sequences are re-ordered according to similarity.  To keep the input order,
% mafft-sparsecore.rb -i in -o inputorder > out

If the input data contains reverse complementary sequences (nucleotide only),

% mafft-sparsecore.rb -i in -D '--adjustdirection' > out

To specify the seed of random numbers,

% mafft-sparsecore.rb -i in -s seed > out
seed=0 by default.