If other alphabets are included in the input data, then the calculation stops with an error message. Non-alphabetical characters (excl. period) are removed by default. Use the --anysymbol option below, if you need to use these characters.
Sequence type (protein / nucleotide) is automatically recognized based on the frequency of a, t, g, c, and u, unless the --amino or --nuc flag is given.
% mafft --anysymbol input > output
It accepts any printable characters (U, O, #, $, %, etc.; 0x21-0x7e in the ASCII code), execpt for > (0x3e) and ( (0x28). Unusual characters are scored as unknown (not considered in the calculation), unlike in the --text mode.
When the input data is:
> SampleSequenceWithUnusualCharacter > Sample#Sequence_With%Various^Unusual*Characters > SAMPLESEQUENCE
The result will be:
> Sample-Sequence-With---------Unusual-Character- > Sample#Sequence_With%Various^Unusual*Characters > SAMPLE-SEQUENCE--------------------------------
Upper/lower case is preserved. The --anysymbol option is internally equivalent to the --preservecase option.
For aligning non-biological sequences, use the --text mode, in which unusual characters are also considered in the alignment calculation.