MAFFT alignment for a large number of sequences

This option is not for SARS-CoV-2. To align full-length genomes of SARS-CoV-2, try another experimental option (2020/Apr).

This service is experimental, 2017/Aug. Upper limit of data size and other settings may be changed after trying actual cases.

Typical data size is up to ∼200,000 sequences × ∼5,000 sites (including gaps), but depends on similarity. Not for long genomic sequences.

Input:
Upload DNA or protein sequences (FASTA format) in a plain text file (example)
or a .zip file containing a single plain text file (example)

or paste sequences (FASTA format) here:

Allow unusual symbols (Selenocysteine "U", Inosine "i", non-alphabetical characters, etc.) Help

UPPERCASE / lowercase:

Same as input

Amino acid → UPPERCASE / Nucleotide → lowercase

Direction of nucleotide sequences: Help Updated!

Same as input

Adjust direction according to the first sequence (accurate enough for most cases)

Adjust direction according to the first sequence (only for highly divergent data; extremely slow)

Output order:

Same as input

Aligned

Title length in Clustal format (only first word is used as title):

(10 – 100)

Job name (optional):

(basic Latin alphabet, number and space only)

Notify when finished (optional; recommended when submitting large data):

Email address:

Memory usage (effective for FFT-NS-1, FFT-NS-2 and mafft-sparsecore): Help

Normal mode

Low-memory mode (accepts more than 100,000 sequences but slower and slightly less accurate than default)

Parameters:

Scoring matrix for amino acid sequences:

Scoring matrix for nucleotide sequences:

↑ Switch it to '1PAM / κ=2' when aligning closely related DNA sequences.

Gap opening penalty: (1.0 – 5.0)

Offset value: (0.0 – 1.0)

Score of N in nucleotide data: example

↓ Long stretches of Ns tend to be gapped (excluded from the alignment).

(nzero) N has no effect on the alignment score.

(nwildcard) N is treated like a wildcard. Experimental option (2016/Apr/26)

↑ Try this if Ns should be aligned with usual letters.

Plot LAST hits (DNA only):

The top sequence vs the others The longest sequence vs the others

Plot and alignment Plot only Alignment only

Threshold:

Katoh, Rozewicki, Yamada 2019 (Briefings in Bioinformatics 20:1160-1166)
MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization
Kuraku, Zmasek, Nishimura, Katoh 2013 (Nucleic Acids Research 41:W22-W28)
aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity

Multiple alignment program for amino acid or nucleotide sequences