To avoid overload, try a light-weight option, for virus genomes (2024/Aug).
The function to email at the end of calculattion was not working in August, 2024. Fixed on Aug 23.
For a large number of short sequences, try an experimental service.
Experimental service for aligning raw reads (Updated, 2023/Nov)
If you need an MSA of only a specific region, then try extracting the region first (2022/Oct). New!
or upload a plain text file:
Use DASH to add homologous structures (protein only)
Ouput original plus DASH sequences Output original sequences only
Give structural alignment(s) externally prepared
Structural alignment 1 (optional): Paste an alignment in fasta format. Example These sequences will be aligned with the 'input' sequences above, being used as a constraint. More Structural alignment 2 (optional): More Less Structural alignment 3 (optional): More Less Structural alignment 4 (optional): Less
Allow unusual symbols (Selenocysteine "U", Inosine "i", non-alphabetical characters, etc.) Help
Same as input
Amino acid → UPPERCASE / Nucleotide → lowercase
Adjust direction according to the first sequence (accurate enough for most cases)
Adjust direction according to the first sequence (only for highly divergent data; extremely slow)
Aligned
(10 – 100)
(basic Latin alphabet, number and space only)
Email address:
Auto (FFT-NS-1, FFT-NS-2, FFT-NS-i or L-INS-i; depends on data size) Updated
FFT-NS-1 (Very fast; recommended for >2,000 sequences; progressive method)
FFT-NS-2 (Fast; progressive method)
G-INS-1 (Slow; progressive method with an accurate guide tree)
FFT-NS-i (Slow; iterative refinement method)
E-INS-i (Very slow; recommended for <200 sequences with multiple conserved domains and long gaps; 2 iterative cycles only) Help Updated (2015/Jun)
L-INS-i (Very slow; recommended for <200 sequences with one conserved domain and long gaps; 2 iterative cycles only) Help
G-INS-i (Very slow; recommended for <200 sequences with global homology; 2 iterative cycles only) Help
Q-INS-i (Extremely slow; secondary structure of RNA is considered; recommended for a global alignment of highly divergent ncRNAs with <200 sequences × <1,000 nucleotides; the number of iterative cycles is restricted to two, 2016/May) Help
Unalignlevel: 0.0 0.8
↑ Default This feature is available only when G-INS-1 or G-INS-i is selected in the Strategy section above.
Try to align gappy regions anyway
Leave gappy regions (Not recommended for >∼1,000 sequences)
Scoring matrix for amino acid sequences: BLOSUM30 BLOSUM45 BLOSUM62 BLOSUM80 JTT100 JTT200
Scoring matrix for nucleotide sequences: 1PAM / κ=2 20PAM / κ=2 200PAM / κ=2
↑ Switch it to '1PAM / κ=2' when aligning closely related DNA sequences.
Gap opening penalty: (1.0 – 5.0)
Offset value: (0.0 – 1.0)
↓ Long stretches of Ns tend to be gapped (excluded from the alignment).
(nzero) N has no effect on the alignment score.
(nwildcard) N is treated like a wildcard. Experimental option (2016/Apr/26)
↑ Try this if Ns should be aligned with usual letters.
Default UPGMA
Output guide tree
To display the tree, follow the “Refine dataset” link in the result page.
On
Show homologs (if any)
Number of homologs: (5 – 600)
Threshold: E = (1e-1 – 1e-40)
Use SwissProt (less comprehensive and requires shorter search time; previous default)
Use UniRef50 (more comprehensive and requires longer search time) 2019/Mar
The top sequence vs the others The longest sequence vs the others
Plot and alignment Plot only Alignment only
Threshold: score=39 (E=8.4e-11) score=22 (E=0.00805) score=20 (E=0.0699) score=12 (E=398)