FASTA format description


A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is:
>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK

IUPAC-IUB Codes


Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
The nucleic acid codes supported are:
     A --> adenosine           M --> A C (amino)
     C --> cytidine            S --> G C (strong)
     G --> guanine             W --> A T (weak)
     T --> thymidine           B --> G T C
     U --> uridine             D --> G A T
     R --> G A (purine)        H --> A C T
     Y --> T C (pyrimidine)    V --> G C A
     K --> G T (keto)          N --> A G C T (any)
                               - --> gap of indeterminate length
For those programs that use amino acid query sequences (BLASTP and TBLASTN), the accepted amino acid codes are:
 A  alanine                      P  proline
 B  aspartate or asparagine      Q  glutamine
 C  cystine                      R  arginine
 D  aspartate                    S  serine
 E  glutamate                    T  threonine
 F  phenylalanine                U  selenocysteine
 G  glycine                      V  valine
 H  histidine                    W  tryptophan
 I  isoleucine                   Y  tyrosine
 K  lysine                       Z  glutamate or glutamine
 L  leucine                      X  any
 M  methionine                   *  translation stop
 N  asparagine                   -  gap of indeterminate length

IUPAC-IUB/GCG Ambiguity Codes


CodesMeaningComplement
AA (adenine)T
CC (cytosine)G
GG (guanine)C
T/UT (thymine in DNA; uracil in RNA)A
MA or CK
RA or GY
WA or TW
SC or GS
YC or TR
KG or TM
VA or C or GB
HA or C or TD
DA or G or TH
BC or G or TV
X/NG or A or T or CX
.not G or A or T or C.

Reference:
Authority:  Nomenclature Committee of the International Union of Biochemistry 
Reference:  Cornish-Bowden, A.  Nucl Acid Res 13, 3021-3030 (1985)
IUPAC/IUB:  International Union of Pure and Applied Chemistry/International Union of Biochemistry

Web Access Statistics © 1998-2008 NAGRP - US Pig Gene Mapping Coordination Program.
Contact: NAGRP Bioinformatics Project Team
November 23, 2008 (Sunday)