MAFFT version 7

Multiple alignment program for amino acid or nucleotide sequences

This feature is supported in versions ≥7.120.

Non-biological sequences

Non-biological sequences, or texts consisting of printable characters, can be aligned in the --text mode. 


> text 1
> text 2

The simplest command:

% mafft --text input > output

Other options are also available in this mode.

% mafft-ginsi --text input > output
% mafft --text --clustalout input > output

Output of --text:

*** **             ** ****

Extended alphabet  In alpha testing

Version 7.270 and higher accept extended characters such as ö and ä. 

Text has to be 8-bit encoded, like LATIN1, Windows-1252, Mac OS Roman.  UTF8 can be converted to/from LATIN1 by the iconv program on Linux, if the text uses Western European alphabets only.

% iconv -f UTF-8 -t LATIN1 input.utf8 > input.latin1
% mafft --text input.latin1 > output.latin1
% iconv -f LATIN1 -t UTF-8 output.latin1 > output.utf8

The acceptable characters in these versions are 0x00-0xFF excluding > (0x3E), = (0x3D), < (0x3C), - (0x2D), Space (0x20), Carriage Return (0x0d), Line Feed (0x0a) and NULL (0x00).  So the maximum size of alphabet should be 248. 

If non-text data is mapped to this range of characters, the data can be aligned by mafft --textNot tested yet

Difference between --text and --anysymbol

The --anysymbol option also accepts input data with non-alphabetical symbols.  This option is for amino acid or nucleotide sequences that contain unusual symbols, such as U and i.  Input sequences are interpreted as amino acid or nucleotide sequences, unlike --text.

An alignment by --anysymbol:

                   **  * *