Add new sequences to an existing alignment using MAFFT

Due to a configuration change on May 22, 2025, the performance of this service should have improved slightly during periods of high load. Please let us know if you notice any side effects.

To extract a short region from a set of long unaligned sequences, try another function (2024/Nov).

Add new sequence(s) to reference Help

Reference: Example
Gaps (-) will be preserved.

or upload a plain text file: Clear
Zipped file is acceptable.

New sequence(s) to be added to the reference above: Example
Gaps (if any) will be removed.

or upload a plain text file: Clear
Zipped file is acceptable.

Allow unusual symbols (Selenocysteine "U", Inosine "i", non-alphabetical characters, etc.) Help

UPPERCASE / lowercase:
Same as input
Amino acid → UPPERCASE / Nucleotide → lowercase

Direction of nucleotide sequences:
Same as input
Adjust direction according to the first sequence (accurate enough for most cases) Beta
Adjust direction according to the first sequence (only for highly divergent data; very slow) Beta

Output order:
Same as input
Aligned

Sequence title:
Same as input
Insert "New|" at the head of title of each new sequence

Title length in Clustal format (only first word is used as title):
(10 – 100)

Job name (optional):
(basic Latin alphabet, number and space only)

Notify when finished (optional; recommended when submitting large data):
Email address:

Options specifically for SARS-CoV-2 2022/Mar

Add full-length genomes of SARS-CoV-2 to MSAs by GISAID EpiCoV:

By this switch,

Uses the reference data corresponding to the MSA selected above.
Sets the same flags (--compactmapout, --maxambiguous and --addtotop) as the calculation in GISAID.

Just input your new sequences to the New sequence(s) box.

The resulting alignment can be concatenated to the entire or a part of GISAID's MSA to incorporate your new sequences into the MSA. The GISAID MSA has to be downloaded separately from the original site.

Insert gaps at codon boundaries as possible. Works only with GISAID's references.

--addtotop: Use only the top sequence as reference.

Advanced settings

Ambiguous letters:
Remove sequences that have ambiguous letters more than:

and replace succesive ns (nucleotide) or Xs (protein) in new sequences with a single n or X.

Keep alignment length:
Yes
With this option, insertions at the new sequenes are deleted, to keep the alignment length the same as the input alignment.

--compactmapout: Output the positions of insertions in added sequences and in the reference alignment. Updated (2021/Dec)

↑ Failed when the "allow unusual symbols" option was on, Jan/18 –. Fixed Jan/20, 2022.

Strategy:
Auto (--multipair or --6merpair; depends on data size)

--6merpair (Fast)
--multipair --weighti 0 (Intermediate)
--multipair (Accurate)

Parameters:
Scoring matrix for amino acid sequences:
Scoring matrix for nucleotide sequences:

↑ Switch it to '1PAM / κ=2' when aligning closely related DNA sequences.

Gap opening penalty: (1.0 – 5.0)
Offset value: (0.0 – 1.0)

↑ If long gaps are not expected, set it as 0.1 or larger value.

Score of N in nucleotide data: Example

↓ Long stretches of Ns tend to be gapped (excluded from the alignment).

(nzero) N has no effect on the alignment score.

(nwildcard) N is treated like a wildcard. Experimental option (2016/Apr/26)

↑ Try this if Ns should be aligned with usual letters.

Multiple alignment program for amino acid or nucleotide sequences

Add new sequence(s) to reference Help

Options specifically for SARS-CoV-2 2022/Mar

Advanced settings