cbrc
MAFFT version 7

Multiple alignment program for amino acid or nucleotide sequences

--addfull and --add: Adding unaligned full-length sequence(s) into an existing alignment Updated! (2015/May)

--add
--add
% mafft --add new_sequences --reorder existing_alignment > output

--addfragments: Adding unaligned fragmentary sequence(s) into an existing alignment Updated! (2015/May)

--add
--addfragments

Accurate option:

% mafft --addfragments fragments --reorder --thread -1 existing_alignment > output
Fast option (accurate enough for highly similar sequences):
% mafft --addfragments fragments --reorder --6merpair --thread -1 existing_alignment > output

Example New! (2015/May)

When existing_alignment is
>seq1
ACCDEFGHI-K
>seq2
A--DEFGHI-K
and a sequence to be added (newseq) is
>newseq
ACCDPQRSTEFG
then the result of mafft --add is
% mafft --add newseq existing_alignment

seq1            ACCD-----EFGHIK
seq2            A--D-----EFGHIK
newseq          ACCDPQRSTEFG---
                *  *     ***
The alignment length is changed (11→15) in this case, as PQRST is inserted (+5) and a gap-only column (between I and K) is removed (-1).

With the --keeplength option (supported in versions ≥7.228), the insertion PQRST is removed and the alignment length is kept unchanged (11→11).

% mafft --keeplength --add newseq existing_alignment

seq1            ACCDEFGHI-K
seq2            A--DEFGHI-K
newseq          ACCDEFG----
                *  ****
With the --mapout option, a correspondence table of positions is output to the newseq.map file.
% mafft --mapout --add newseq existing_alignment
% cat newseq.map

>newseq
# letter, position in the original sequence, position in the reference alignment
A, 1, 1
C, 2, 2
C, 3, 3
D, 4, 4
P, 5, -
Q, 6, -
R, 7, -
S, 8, -
T, 9, -
E, 10, 5
F, 11, 6
G, 12, 7
BUG!! Bug information: Versions ≤7.154 had a bug in --addfragments.  When the sequences in the reference alignment were almost identical to each other, an incorrect result was occasionally returned due to this bug.  This bug has been fixed in version 7.157 (2014/Jun/10).
BUG!! Bug information: Versions 6.923 - 6.950 had a bug in the combination of --addfragments and --reorder.  The order of sequences in the output was incorrect.  This bug has been fixed in version 6.951 (2012/Oct/18).
BUG!! Bug information: Versions ≤6.815 had a problem in processing partial sequences.  When the new sequence has domains A and B but a part of sequences in the existing alignment lack domain B, domain B was sometimes not aligned.  This problem has been fixed in version 6.817 (2010/Aug/14).

Possible misalignment by versions ≤6.815:
existing alignment: AAAAAAAA---------BBBBBBBBB
existing alignment: AAAAAAAA---------BBBBBBBBB
existing alignment: AAAAAAAA------------------
new sequence:       AAAAAAAABBBBBBBBB---------

Fixed in versions ≥6.817:

existing alignment: AAAAAAAABBBBBBBBB
existing alignment: AAAAAAAABBBBBBBBB
existing alignment: AAAAAAAA---------
new sequence:       AAAAAAAABBBBBBBBB

Adding aligned sequences (profile) into an existing alignment

% mafft --addprofile aligned_sequences existing_alignment > output

Difference from the --seed option

--seed

Difference from the mafft-profile program

The --addprofile option covers all the situations where the mafft-profile program was used.  Morever, the former is applicable to larger datasets than the latter.  Therefore, the mafft-profile program will be deleted in future releases. 

The mafft-profile program assumes that each profile separately forms a monophyletic cluster.
mafft-profile