(not yet completed, 2016/Aug)
Available in versions ≥7.294.
mafft-sparsecore.rb
This script partially applies the iterative refinement technique to a large number of short sequences (more than ∼10,000 sequences × less than ∼5,000 sites excl. gaps; assumed to be highly conserved).
(0) A small number of core sequences are randomly picked up and (1) aligned by an
iterative refinement option.
Then (2) the remaining sequences are progressively added to the core alignment, with
the --add option.
See
Yamada et al. (2016) for details.
% mafft-sparsecore.rb -i in > out
The Ruby interpreter is needed to run this script on command line.
Online service is available here.
(0) Selection of core sequences
Two parameters,
p and
n control how core sequences are selected.
% mafft-sparsecore.rb -i in -p p -n n% > out
- p is the number of core sequences to be selected. p=500 by default.
- Core sequences are selected from the sequences of longer n%. n=50 by default.
If specific sequences should be included in the core alignment, then mark them with ">_focus_" at their title lines in the input file (experimental feature).
>_focus_ Important sequence
ARNDCQEGHILKMFPSTWYV
>_focus_ Important sequence
ADCQEGHLKMFPSTWYV
> Less important sequence
ARNCQEGHILKFPSTWV
(1) Iterative refinement stage
The
G-INS-i strategy is used by default, for the iterative refinement calculation.
Options for this stage can be specified using
-C.
To restrict the number of iterations to two,
% mafft-sparsecore.rb -i in -C '--maxiterate 2' > out
To use the L-INS-i strategy (uses local pairwise alignments),
% mafft-sparsecore.rb -i in -C '--localpair' > out
(2) Progressive stage
Options for this stage can be specified using
-A.
To use a memory-saving tree (for more than 100,000 sequences),
% mafft-sparsecore.rb -i in -A '--memsavetree' > out
To omit the recalculation of guide tree,
% mafft-sparsecore.rb -i in -A '--retree 1' > out
(faster and less accurate than the default)
Other options
By default, the sequences are re-ordered according to similarity.
To keep the input order,
% mafft-sparsecore.rb -i in -o inputorder > out
If the input data contains reverse complementary sequences (nucleotide only),
% mafft-sparsecore.rb -i in -D '--adjustdirection' > out
To specify the seed of random numbers,
% mafft-sparsecore.rb -i in -s seed > out
seed=0 by default.