Input:
> text 1 2008~KATO~Toh > text 2 2005~Katoh~Kuma~MIYATA~Toh
The simplest command:
% mafft --text input > output
Other options are also available in this mode.
% mafft-ginsi --text input > output
% mafft --text --clustalout input > output
etc
In versions <7.369, the combination with --globalpair or --localpair sometimes failed.
In versions 7.395 – 7.409, the combination with --clustalout did not work. This bug will be fixed soon (2019/Jan).
Output of --text:
2008~K-------------ATO~Toh 2005~Katoh~Kuma~MIYATA~Toh *** ** ** ****
------Northern_part_of----_Cha_das_Caldeiras,_near_Fernao_Gomes. ------Northern_part_of_the_Cha_das_Caldeiras-_near_Fernão_Gomes- ----------------------------------------------Near_Fernão_Gomes- Fógo:_northern_part_of_the_Cha_das_Caldeiras._------------------
Text has to be 8-bit encoded, like LATIN1, Windows-1252, Mac OS Roman. UTF8 can be converted to/from LATIN1 by the iconv program on Linux, if the text uses Western European alphabets only.
% iconv -f UTF-8 -t LATIN1 input.utf8 > input.latin1 % mafft --text input.latin1 > output.latin1 % iconv -f LATIN1 -t UTF-8 output.latin1 > output.utf8
Two format converters, hex2maffttext and maffttext2hex, to easily handle 248 alphabets are bundled in versions ≥7.390.
Usage:
(1) Prepare an input file, input.hex, in hexadecimal code (in the range explained above) using space as separator. Title of each sequence should be marked by >
>sequence1 01 02 03 4e 6f 72 74 68 65 72 6e 5f 70 61 ... >sequence2 01 02 03 4e 6f 72 74 68 65 72 6e 5f 70 61 ... >sequence3 a3 6f 5f 47 6f 6d 65 73 ... >sequence4 01 02 03 46 c3 b3 67 6f 3a 5f 6e 6f 72 74 68 65 72 6e 5f 70 61 ...
(2) Convert this file to ASCII code (including printable characters and control characters):
% /usr/local/libexec/mafft/hex2maffttext input.hex > input.ASCII
On Windows PowerShell, which uses UTF-16 by default, necessary to convert to ASCII by two steps: 2022/AugPS C:\somewhere> usr\lib\mafft\hex2maffttext input.hex > input.utf16 PS C:\somewhere> Get-Content input.utf16 | Set-Content -Encoding ASCII input.ASCII
(3) Run mafft --text
% mafft --text --clustalout input.ASCII > output.ASCII
(4) The output can be converted back to hexadecimal notation by:
% /usr/local/libexec/mafft/maffttext2hex output.ASCII > output.hex
Result:
CLUSTAL format alignment by MAFFT (v7.390) sequence1 01 02 03 -- -- -- -- -- -- -- 4e 6f 72 74 68 65 72 6e 5f 70 61 ... sequence2 01 02 03 -- -- -- -- -- -- -- 4e 6f 72 74 68 65 72 6e 5f 70 61 ... sequence3 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- ... sequence4 01 02 03 46 c3 b3 67 6f 3a 5f 6e 6f 72 74 68 65 72 6e 5f 70 61 ... sequence1 61 6f 5f 47 6f 6d 65 73 2e sequence2 a3 6f 5f 47 6f 6d 65 73 -- sequence3 a3 6f 5f 47 6f 6d 65 73 -- sequence4 -- -- -- -- -- -- -- -- --
% mafft --textmatrix matrixfile input > output
The format of matrixfile is:
0x01 0x01 2 # (comment) 0x1e 0x1e 2 0x1f 0x1f 2 0x21 0x21 2 # ! × ! 0x41 0x41 2 # A × A 0x42 0x42 2 # B × B 0x43 0x43 2 # C × C 0x44 0x44 2 # D × D 0x30 0x30 2 # 0 × 0 0x31 0x31 2 # 1 × 1 0x32 0x32 2 # 2 × 2 0x33 0x33 2 # 3 × 3 0x34 0x34 2 # 4 × 4 0x41 0x30 0.5 # A × 0 0x30 0x41 0.5 # 0 × A (Unnecessary in versions ≥ 7.400) 0x42 0x31 0.5 # B × 1 0x31 0x41 0.5 # 1 × B (Unnecessary in versions ≥ 7.400) 0x46 0x35 0.5 0x35 0x46 0.5 (Unnecessary in versions ≥ 7.400)Not necessary to give all of 248x248 pairs. If a score for a pair is given in this file, the score overrides the default one for the pair. If a pair does not appear in this file, then the default score is used for the pair. Texts after '#' are ignored.
In versions < 7.400, a mismatch score between letters p and q (p≠q) had to be specified twice (ie, p×q and q×p). In versions ≥ 7.400, this is unnecessary; if a score for p×q is set, then the same score is used for q×p, too.
An alignment by --anysymbol:
-------------2008~KATO~Toh 2005~Katoh~Kuma~MIYATA~Toh ** * *