To avoid overload,
a light-weight option,
for MSA of full-length SARS-CoV-2 genomes
For a large number of short sequences,
try an experimental service.
Experimental service for aligning raw reads (Updated, 2023/Nov)
If you need an MSA of only a specific region, then try extracting the region first (2022/Oct). New!
or upload a plain text file:
Use DASH to add homologous structures (protein only)
Ouput original plus DASH sequences
Output original sequences only
Give structural alignment(s) externally prepared
Structural alignment 1 (optional):
Paste an alignment in fasta format. Example
These sequences will be aligned with the 'input' sequences above,
being used as a constraint.
Structural alignment 2 (optional):
Structural alignment 3 (optional):
Structural alignment 4 (optional):
Allow unusual symbols (Selenocysteine "U", Inosine "i", non-alphabetical characters, etc.)
Same as input
Amino acid → UPPERCASE / Nucleotide → lowercase
Adjust direction according to the first sequence (accurate enough for most cases)
Adjust direction according to the first sequence (only for highly divergent data; extremely slow)
Same as input
(10 – 100)
(basic Latin alphabet, number and space only)
Auto (FFT-NS-1, FFT-NS-2, FFT-NS-i or L-INS-i; depends on data size)
FFT-NS-1 (Very fast; recommended for >2,000 sequences; progressive method)
FFT-NS-2 (Fast; progressive method)
G-INS-1 (Slow; progressive method with an accurate guide tree)
FFT-NS-i (Slow; iterative refinement method)
E-INS-i (Very slow; recommended for <200 sequences with multiple conserved domains and long gaps; 2 iterative cycles only)
L-INS-i (Very slow; recommended for <200 sequences with one conserved domain and long gaps; 2 iterative cycles only)
G-INS-i (Very slow; recommended for <200 sequences with global homology; 2 iterative cycles only)
(Extremely slow; secondary structure of RNA is considered; recommended for a global alignment of highly divergent ncRNAs with <200 sequences × <1,000 nucleotides; the number of iterative cycles is restricted to two, 2016/May)
This feature is available only when G-INS-1 or G-INS-i is selected in the Strategy section above.
Try to align gappy regions anyway
Leave gappy regions (Not recommended for >∼1,000 sequences)
Scoring matrix for amino acid sequences:
Scoring matrix for nucleotide sequences:
1PAM / κ=2
20PAM / κ=2
200PAM / κ=2
↑ Switch it to '1PAM / κ=2' when aligning closely related DNA sequences.
Gap opening penalty: (1.0 – 5.0)
(0.0 – 1.0)
↓ Long stretches of Ns tend to be gapped (excluded from the alignment).
(nzero) N has no effect on the alignment score.
(nwildcard) N is treated like a wildcard.
Experimental option (2016/Apr/26)
↑ Try this if Ns should be aligned with usual letters.
Output guide tree
To display the tree, follow the “Refine dataset” link in the result page.
Show homologs (if any)
Number of homologs: (5 – 600)
Threshold: E = (1e-1 – 1e-40)
Use SwissProt (less comprehensive and requires shorter search time; previous default)
Use UniRef50 (more comprehensive and requires longer search time) 2019/Mar
The top sequence vs the others
The longest sequence vs the others
Plot and alignment