MAFFT alignment and NJ / UPGMA phylogeny

The service was unstable

2:30 PM – ~~4:00 PM~~ 7:00 PM on June 25, 2025 (JST)
9:00 AM – 10:30 AM on June 26, 2025 (JST)

due to a configuration change. Several jobs were cancelled during these periods. If you did not receive your job results, please resubmit the job.

To avoid overload, try a light-weight option, for virus genomes (2024/Aug).

The function to email at the end of calculattion was not working in August, 2024. Fixed on Aug 23.

For a large number of short sequences, try an experimental service.

Experimental service for aligning raw reads (Updated, 2023/Nov)

If you need an MSA of only a specific region, then try extracting the region first (2022/Oct). New!

Multiple sequence alignment and NJ / UPGMA phylogeny

Input:
Paste protein or DNA sequences in fasta format. Example

or upload a plain text file:

Use DASH to add homologous structures (protein only)

Ouput original plus DASH sequences Output original sequences only

Give structural alignment(s) externally prepared

Allow unusual symbols (Selenocysteine "U", Inosine "i", non-alphabetical characters, etc.) Help

UPPERCASE / lowercase:

Same as input

Amino acid → UPPERCASE / Nucleotide → lowercase

Direction of nucleotide sequences: Help

Same as input

Adjust direction according to the first sequence (accurate enough for most cases)

Adjust direction according to the first sequence (only for highly divergent data; extremely slow)

Output order:

Same as input

Aligned

Title length in Clustal format (only first word is used as title):

(10 – 100)

Job name (optional; used as output file name and subject of emails):

(basic Latin alphabet, number and space only)

Notify when finished (optional; recommended when submitting large data):

Email address:

Advanced settings

Strategy:

Auto (FFT-NS-1, FFT-NS-2, FFT-NS-i or L-INS-i; depends on data size) Updated

Progressive methods

FFT-NS-1 (Very fast; recommended for >2,000 sequences; progressive method)

FFT-NS-2 (Fast; progressive method)

G-INS-1 (Slow; progressive method with an accurate guide tree)

Iterative refinement methods

FFT-NS-i (Slow; iterative refinement method)

E-INS-i (Very slow; recommended for <200 sequences with multiple conserved domains and long gaps; 2 iterative cycles only) Help Updated (2015/Jun)

L-INS-i (Very slow; recommended for <200 sequences with one conserved domain and long gaps; 2 iterative cycles only) Help

G-INS-i (Very slow; recommended for <200 sequences with global homology; 2 iterative cycles only) Help

Q-INS-i (Extremely slow; secondary structure of RNA is considered; recommended for a global alignment of highly divergent ncRNAs with <200 sequences × <1,000 nucleotides; the number of iterative cycles is restricted to two, 2016/May) Help

Align unrelated segments, too? in Alpha Testing (2014/Mar)
If the input data is expected to be globally conserved but locally contaminated by unrelated segments, try 'Unalignlevel>0' and possibly 'Leave gappy regions'.

Unalignlevel:
0.0 0.8

↑ Default
This feature is available only when G-INS-1 or G-INS-i is selected in the Strategy section above.

Try to align gappy regions anyway

Leave gappy regions (Not recommended for >∼1,000 sequences)

Parameters:

Scoring matrix for amino acid sequences:

Scoring matrix for nucleotide sequences:

↑ Switch it to '1PAM / κ=2' when aligning closely related DNA sequences.

Gap opening penalty: (1.0 – 5.0)

Offset value: (0.0 – 1.0)

Score of N in nucleotide data: Example

↓ Long stretches of Ns tend to be gapped (excluded from the alignment).

(nzero) N has no effect on the alignment score.

(nwildcard) N is treated like a wildcard. Experimental option (2016/Apr/26)

↑ Try this if Ns should be aligned with usual letters.

Guide tree:

Default UPGMA

Output guide tree

To display the tree, follow the “Refine dataset” link in the result page.

Mafft-homologs (Collects homologs by PSI-BLAST and aligns homologs with input sequences; Protein only): Help

Show homologs (if any)

Number of homologs: (5 – 600)

Threshold: E = (1e-1 – 1e-40)

Use SwissProt (less comprehensive and requires shorter search time; previous default)

Use UniRef50 (more comprehensive and requires longer search time) 2019/Mar

Plot LAST hits (DNA only):

The top sequence vs the others The longest sequence vs the others

Plot and alignment Plot only Alignment only

Threshold:

References

Katoh, Rozewicki, Yamada 2019 (Briefings in Bioinformatics 20:1160-1166)
MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization
Kuraku, Zmasek, Nishimura, Katoh 2013 (Nucleic Acids Research 41:W22-W28)
aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity

Multiple alignment program for amino acid or nucleotide sequences

Multiple sequence alignment and NJ / UPGMA phylogeny

Advanced settings

References