The Genetic Code
T C A G T Phe [F] Ser [S] Tyr [Y] Cys [C] T C Leu [L] Ter [end] Ter [end] A Trp [W] G C Leu [L] Pro [P] His [H] Arg [R] T C Gln [Q] A G A Ile [I] Thr [T] Asn [N] Ser [S] T C Lys [K] Arg [R] A Met [M] G G Val [V] Ala [A] Asp [D] Gly [G] T C Glu [E] A G An explanation of the Genetic Code: DNA is a two-stranded molecule. Each strand is a polynucleotide composed of A (adenosine), T (thymidine), C (cytosine), and G (guanosine) residues polymerized by "dehydration" synthesis in linear chains with specific sequences. Each strand has polarity, such that the 5'-hydroxyl group of the first nucleotide begins the strand and the 3'-hydroxyl group of the final nucleotide ends strand; accordingly, we say that this strand runs 5' to 3' . It is also essential to know that the two strands of DNA run antiparallel such that one strand runs 5' -> 3' while the other one runs 3' -> 5'. At each nucleotide residue along the double-stranded DNA molecule, the nucleotides are complementary. That is, A forms two hydrogen-bonds with T; C forms three hydrogen bonds with G. In most cases the two-stranded, antiparallel, complementary DNA molecule folds to form a helical structure which resembles a spiral staircase. This is the reason why DNA has been referred to as the "Double Helix".
One strand of DNA holds the information that codes for various genes; this strand is often called the template strand or antisense strand (containing anticodons). The other, and complementary, strand is called the coding strand or sense strand (containing codons). Since mRNA is made from the template strand, it has the same information as the coding strand. The table above referrs to the sequence 5' -> 3' of the coding or sense strand of DNA; the code for the mRNA would be identical but for the fact that RNA contains U (uridine) rather than T.
An example of two complementary strands of DNA would be:
(5' -> 3') ATGGAATTCTCGCTC (Coding, sense strand) (3' <- 5') TACCTTAAGAGCGAG (Template, antisense strand) (5' -> 3') AUGGAAUUCUCGCUC (mRNA made from Template strand)Since amino acid residues of proteins are specified as triplet codons, the protein sequence made from the above example would be Met-Glu-Phe-Ser-Leu... (MEFSL...).
Sequence Symbols
GCG programs allow all upper- and lower-case letters, periods (.), asterisks (*), pluses (+), ampersands (&), and ats (@) as symbols in biological sequences. Nucleotide symbols, their complements, and the standard one-letter amino acid symbols are shown below in separate lists. The meanings of the symbols +, &, and @ have not been assigned at this writing (March, 1989).GCG uses the letter codes for amino acid codes and nucleotide ambiguity proposed by IUB (Nomenclature Committee, 1985, Eur. J. Biochem. 150; 1-5). These codes are compatible with the codes used by the EMBL, GenBank, and PIR data libraries.
Nucleotides
The meaning of each symbol, its complement, and the Cambridge equivalents are shown below. Cambridge files can be converted into GCG files and vice versa with the programs FromStaden and ToStaden.
IUB/GCG Meaning Complement Staden/Sanger A A T A C C G C G G C G T/U T A T M A or C K 5 R A or G Y R W A or T W 7 S C or G S 8 Y C or T R Y K G or T M 6 V A or C or G B not supported H A or C or T D not supported D A or G or T H not supported B C or G or T V not supported X/N G or A or T or C X -/X . not G or A or T or C . not supportedThe uncertainty and frame ambiguity codes used by Staden are not supported by GCG and are translated by FromStaden as the lower case single base equivalent.
Staden Code Meaning GCG 1 probably C c 2 probably T t 3 probably A a 4 probably G g D C or CC c V T or TT t B A or AA a H G or GG g K C or CX c L T or TX t M A or AX a N G or GX g
Amino Acids
Here is a list of the standard one-letter amino acid codes and their three-letter equivalents. The synonymous codons and their depiction in the IUB codes are shown. You should recognize that the codons following semicolons (;) are not sufficiently specific to define a single amino acid even though they represent the best possible backtranslation into the IUB codes! All of the relationships in this list can be redefined by you in a local data file, as described below.
IUB Symbol 3-letter Meaning Codons Depiction A Ala Alanine GCT,GCC,GCA,GCG !GCX B Asp,Asn Aspartic, Asparagine GAT,GAC,AAT,AAC !RAY C Cys Cysteine TGT,TGC !TGY D Asp Aspartic GAT,GAC !GAY E Glu Glutamic GAA,GAG !GAR F Phe Phenylalanine TTT,TTC !TTY G Gly Glycine GGT,GGC,GGA,GGG !GGX H His Histidine CAT,CAC !CAY I Ile Isoleucine ATT,ATC,ATA !ATH K Lys Lysine AAA,AAG !AAR L Leu Leucine TTG,TTA,CTT,CTC,CTA,CTG !TTR,CTX,YTR;YTX M Met Methionine ATG !ATG N Asn Asparagine AAT,AAC !AAY P Pro Proline CCT,CCC,CCA,CCG !CCX Q Gln Glutamine CAA,CAG !CAR R Arg Arginine CGT,CGC,CGA,CGG,AGA,AGG !CGX,AGR,MGR;MGX S Ser Serine TCT,TCC,TCA,TCG,AGT,AGC !TCX,AGY;WSX T Thr Threonine ACT,ACC,ACA,ACG !ACX V Val Valine GTT,GTC,GTA,GTG !GTX W Trp Tryptophan TGG !TGG X Xxx Unknown !XXX Y Tyr Tyrosine TAT, TAC !TAY Z Glu,Gln Glutamic, Glutamine GAA,GAG,CAA,CAG !SAR * End Terminator TAA, TAG, TGA !TAR,TRA;TRR References:
- Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. by A. Cornish-Bowden
- IUPAC-IUB Commission on Biochemical Nomenclature (CBN): Abbreviations and Symbols for Nucleic Acids, Polynucleotides and their Constituents
- Nomenclature Committee of the International Union of Biochemistry (NC-IUB): Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences
Compiled by Zhiliang Hu