European Molecular
Biology Computing Network - Biocomputing Tutorials

A Quick List of GCG and
EGCG Programmes

Look!

GCG

EGCG

genhelp

genman

egenhelp or egenman

Programmes organised by function.: Editing; Fragment Assembly; Mapping; Sequence Comparison; Database Searching; Multiple Sequence Analysis; Pattern Recognition and Compositional Analysis; RNA Secondary Structure; Protein Sequence Analysis; Evolutionary Analysis; Translation; Manipulation; Display and Publications; Sequence Exchange; Nucleotide Analysis; Help and Miscellaneous

Editing

seqed sequence editor.

setkeys redefines the keyboard for seqed, lineup & elineup
(See Multiple Sequence Analysis for (e)lineup).

newfeatures useful for editing feature tables within sequence files.

NB: Do not attempt to edit any GCGformat sequence files with any normal text editors available on the system. The other programmes in the GCG package will think your sequence file has become corrupted. Use one of the GCG or EGCG sequence editors instead.

Return to Programmes organised by function

Fragment Assembly

gelstart to start doing a fragment assembly project.

newgelstart to start using the new fragment assembly system.

gelenter adds fragments to a fragment assembly project.

geloverlap compares sequences in a fragment assembly project.

gelassemble multiple sequence editor for creating contigs.

gelview displays structure of existing contigs.

geldisassemble breaks up the contigs into single fragments again.

gelmerge automatically aligns the seqeunces in a fragment assembly project into contigs.

gelstatus produces a summary report of the quality of each contig in a fragment assembly database.

gelpicture displays displays a diagram of the gel alignments and a printout of the aligned gel sequences and consensus.

gelfigure produces a graphical report of the status of a contig.

gelanalyze produces project statistics by the method of Lander and Waterman.

Return to Programmes organised by function

Mapping

map displays both strands of DNA with restriction map and protein translations.

mapplot shows all possible restriction sites in a sequence graphically.

mapsort finds and sorts restriction enzyme cuts in DNA sequence.

mapselect selects restriction enzymes by name or by their ability to cut a given sequence.

fingerprint identifies the products of T1 ribonuclease digestion.

efingerprint version of GCG's old FingerPrint with command line control.

prime selects oligonucleotide primers for a template DNA sequence.

Return to Programmes organised by function

Sequence Comparison

compare compares two protein or nucleic acid sequences.

dotplot makes a dotplot with output file of 'compare', 'fold' or 'stemloop'.

bestfit makes optimal alignment of two distantly related sequences.

gap makes optimal alignment of two closely related sequences.

gapshow displays the alignment of two sequences.

diverge measures % divergence of proteins by method of Perler & Efstratiadis.

ediverge measures the percent divergence of two protein coding sequences.

overlap compares two sets of DNA sequences to each other in both orientations.

eoverlap compares two sets of DNA sequences.

bigeoverlap eoverlap with a very high limit on total sequence length.

filteroverlap filters only overlaps with specified value for alignments from eoverlap.

nooverlap identifies the places where a group of nucleotide sequences do not share any common subsequences.

framealign creates an optimal alignment between a protein sequence and the codons in all possible reading frames of a NA sequence.

Return to Programmes organised by function

Database Searching

fasta sensitive database search.

tfasta sensitive database search between peptide file and DNA database.

fastacheck FastaCheck selects significant alignments from a (T)Fasta output file.

blast database search for similar sequences (see toblast).

wordsearch search by method of Wilbur & Lipman, 'segments' uses output.

twordsearch TWordSearch identifies DNA sequences similar to a protein query.

stssearch StsSearch looks for primer pairs in a set of sequences.

segments aligns and displays segments of similarity found by 'wordsearch'.

tsegments TSegments aligns and displays the segments of similarity found by TWordSearch.

findpatterns looks for patterns in sequences, allows for ambiguity and mismatches

rfindpatterns FindPatterns with fixed reporting of 5' flanking residues.

patternplot graphical representation of FindPatterns.

stringsearch finds sequences by searching their comments for words e.g., "Human".

lookup does a similar keywording job to stringsearch, but faster!

names finds GCG data files and sequences by name.

fetch gets sequences from the databases & GCG data files.

newfetch copies GCG sequences or fragments or data files from the GCG database.

dataset creates a GCG database from any set of GCG format sequences.

framesearch searches a group of protein sequences for similarity to one or more nucleotide query sequences, or searches a group of nucleotide sequences for similarity to one or more protein query sequences.

Return to Programmes organised by function

Multiple Sequence Analysis

pileup creates a multiple sequence alignment from related sequences.

profalign aligns two alignments ( or sequences ) together.

multalign simultaneous alignment for two or more DNA or protein sequences.

eclustalw calculates a multiple alignment of nucleic acid or protein sequences.

clustree computes a phylogenetic tree (from Clustalw).

sortconsensus identifies the strong consensus regions of an alignment and reports them in sorted order.

lineup multiple sequence editor.

elineup is a screen editor for editing multiple sequence alignments.

pretty displays multiple sequence alignmernts & consensus sequence.

prettyplot displays multiple sequence alignments and calculates a consensus sequence.

prettybox displays multiple sequence alignments and calculates a consensus sequence.

plotsimilarity plots running average of multiple alignment similarity.

eplotsimilarity plots the running average of the similarity among the sequences in a multiple sequence alignment.

profilemake calculates a profile of a group of sequences for 'profilesearch'.

profilesearch uses profile created by 'profilemake' to search database.

tprofilesearch uses a profile to search the nucleotide database.

profilesegments makes optimal alignments using output of 'profilesearch'.

tprofilesegments makes optimal alignments showing the segments of similarity found by TProfileSearch.

profilegap makes optimal alignment between profile and a sequence.

tprofilegap makes an optimal alignment between a profile and a sequence.

profileplot produces a graphical report of the frequency of patterns in sequences.

plotalign plots the mean and range of values for any amino acid parameter you supply.

pepallwindow plots measures of protein hydrophobicity according to the method of Kyte and Doolittle.

polydot compares two sets of sequences, draws a dotplot for each pair of sequences, and reports all identical matches of a specified length.

Return to Programmes organised by function

Pattern Recognition and Compositional Analysis

codonpreference framespecific genefinder.

testcode plots measure of nonrandomness of composition at every 3rd base.

frames shows open reading frames for all 6 translations of DNA.

repeat finds repeats in nucleotide sequences.

erepeat version of GCG's old Repeat with command line control.

window makes table of frequency of patterns, plot the output with 'statplot'.

ewindow version of Window with command line control.

statplot plots set of parallel curves from table made by 'window'.

estatplot version of StatPlot with command line control.

composition determines composition of sequences.

terminator looks for prokaryote factor independant RNA polymerase terminators.

eterminator version of GCG's old Terminator with command line control.

consensus creates a consensus sequence from a set of aligned sequences.

econsensus calculates a consensus sequence for a set of pre-aligned short nucleic acid sequences by tabulating the percent of G, A, T, and C for each position in the set.

fitconsensus probes a sequence for the best fit to a 'consensus' sequence.

codonfrequency tabulates codon usage used by 'codonpref', 'correspond', 'frames'.

ecodonfrequency tabulates codon usage from sequences and/or existing codon usage tables.

correspond looks for similar patterns of codon usage by comparing codon tables.

ecorrespond looks for similar patterns of codon usage by comparing codon frequency tables.

palindrome searches for perfect inverted repeats in a nucleic acid sequence.

Return to Programmes organised by function

RNA Secondary Structure (GCG only)

mfold predicts optimal and suboptimal RNA secondary RNA structure.

plotfold graphically displays the output of mfold.

foldrna 2ry RNA structure, used by 'squiggles', 'mountains', 'circles', 'dotplot'.

squiggles uses output of 'fold' to plot RNA secondary structure nicely.

circles uses output of 'fold' to plot RNA secondary structure.

domes uses output of 'fold' to plot RNA secondary structure.

mountains uses output of 'fold' to plot RNA secondary structure.

stemloop finds inverted repeats in a sequence.

Return to Programmes organised by function

Protein Sequence Analysis

isoelectric plots the charge as a function of pH for any peptide sequence.

motifs uses the PROSITE database to find patterns in protein sequences.

profilescan uses a database of profiles to find structural motifs in proteins.

peptidemap creates peptide map of an amino acid sequence.

peptidesort shows peptides from a digest of an amino acid sequence.

epeptidesort shows the peptide fragments from a digest of an amino acid sequence.

pepplot makes parallel plot of protein 2ry structure and hydrophobicity.

peptidestructure predicts 2ry structure for a peptide, used by 'plotstructure'.

plotstructure plot output of 'peptidestructure'.

pepcoil identifies potential coiled-coil regions of protein sequences.

pepnet views the two-dimensional helical representation of protein sequences.

pepwheel view the periodic distribution of amino acid residues in protein sequences.

pepwindow plots measures of protein hydrophobicity (Kyte and Doolittle).

pepcount reports the number of occurrences of residues at a given position in protein sequences.

pepstats summary of the composition, molecular weight and isoelectric point of a peptide.

sigcleave uses the von Heijne method to locate signal sequences, and to identify the cleavage site.

antigenic looks for potential antigenic regions using the method of Kolaskar.

moment makes contour plot of helical hydrophobic moment of a peptide sequence.

helicalwheel plots a peptide structure as a helical wheel.

helixturnhelix determine the significance of possible helix-turn-helix matches in protein squences.

dodayhoffstat compares the composition of a protein sequence against the Dayhoff statistic for protein composition.

Return to Programmes organised by function

Evolutionary Analysis

distances makes table of pairwise distances within a group of aligned sequences.

homologies makes a table of the pair-wise distances within a group of aligned sequences.

growtree creates a phylogenetic tree from a distance matrix produced by 'distances'.

tophylip writes GCG sequences into a single file in Phylip format.

phylip2tree displays trees computed with one of the PHYLIP-programs in GCG style.

NewDiverge estimates the number of substitutions per site between two nucleic acid sequences that code for proteins.

Return to Programmes organised by function

Translation

translate translate nucleotide sequences into peptide sequences.

etranslate version of GCG's old Translate program with command line control added.

transall simple script which outputs all 6 frames into separate files.

pepdata translates coding regions of GenBank DNA into peptides, also 6 frames.

alltrans translates a set of aligned nucleotide sequences into protein.

mytrans translates part of a nucleotide sequence into protein.

backtranslate create nucleotide sequence from an amino acid sequence.

extractpeptide writes peptide sequence from output of 'map'.

eextractpeptide version of ExtractPeptide with command line control.

Return to Programmes organised by function

Manipulation

assemble conatenates sequences.

eassemble command line control added to GCG's assemble.

reverse reverses and/or complements sequences.

ereverse command line control added to GCG's reverse.

shuffle randomises sequences.

simplify simplifies peptide sequences into broad amino acid categories.

comptable creates symbol comparison tables, e.g., for simplify.

ecomptable command line control added to GCG's comptable.

corrupt randomly introduces small errors into a nucleotide sequence.

pepcorrupt randomly introduces small substitutions, insertions, and deletions.

sample extracts sequence fragments randomly from sequences.

Return to Programmes organised by function

Display and Publications (GCG only)

plasmidmap draws circular plot of a plasmid construct.

figure makes figures and posters. Can incude graphics from other GCG programmes.

red text formatter.

publish arranges sequences for publication.

Return to Programmes organised by function

Sequence Exchange

reformat makes sequence files readable by GCG programmes.

creformat Reformats sequence file(s), scoring matrix file(s), or enzyme data file(s).

fromstaden reformats sequence files from Staden format to GCG.

tostaden reformats sequence files from GCG to Staden format.

efromstaden reformats staden sequence(s) from fasta into GCG format files.

etostaden version of GCG's ToStaden with command line control.

fromig reformats sequence files from Intelligenetics (SEQ) format to GCG.

toig reformats sequence files from GCG to Intelligenetics (SEQ) format.

fromembl reformats sequence from EMBL format to GCG.

toembl reads an EMBL entry from a GCG sequence database as EMBL file format.

fromgenbank reformats sequence files from GenBank format to GCG.

togenbank reads a GenBank entry from a GCG sequence database as GenBank file format.

frompir reformats sequence files from PIR format to GCG.

topir reformats sequence files from GCG to PIR format.

topirall converts a list of sequences into PIR format.

fromfasta reformats sequence files from fasta format to GCG.

efromfasta reformats fasta sequence(s) from fasta into GCG format files.

toblast combines any set of GCG sequences into a database that you can search with blast.

totext converts a sequence into plain text format.

toprimer formats a GCG sequence file into a PRIMER compatible file.

torelate creates an input file for the NBRF RELATE program.

egetseq version of GCG's GetSeq with command line control.

Return to Programmes organised by function

Nucleotide Analysis (EGCG only)

melt calculates the melting temperature and G+C % of NA sequences.

meltplot plots the melting curve for a nucleic acid sequence.

basepairplot plots the percentage occurence and the observed over expected frequency of a di-nucleotide pair relative to their position in a nucleic acid sequence.

cpgplot CpGPlot plots the frequency of occurence of CpG di-nucleotides and C and G percentage relative to their position in a sequence.

cpgreport CpGReport looks for potential CpG islands in a nucleotide sequence.

chaos makes a CHAOS game representation of a nucleic acid sequence.

codfish calculates a set of codon usage statistics for a sequence using a specified codon usage table.

wordcount counts the commonest words in a sequence and reports them in order of frequency and sequence.

wordup detects statistically significant oligonucleotide patterns from six to nine nucleotides long in the sequences under investigation.

poland simulates transition curves of double-stranded DNA.

genetrans extracts and/or translates coding regions as defined in the feature table of sequences stored in the EMBL or Genbank databases.

gapframe moves all gaps in a DNA sequence reading frame to be at codon boundaries.

prima selects oligonucleotide primers for a template DNA sequence.

quicktandem scans for potential tandem repeats in a nucleotide sequence.

tandem looks for multiple tandem repeats of a given size in a nucleotide sequence.

inverted looks for imperfect inverted repeats in a nucleotide sequence.

ecomposition determines the composition of sequence(s).

Return to Programmes organised by function

Help and Miscellaneous

genhelp display the help files of these programmes.

egenhelp help organized by program name.

egenman help organized by function categories.

8.1_Whats_New what's new in version 8.1-unix of the EGCG extensions to the GCG package.

setplot allows you to change the device that plots come out on.

showplot shows what your graphics device is currently set to.

plottest plots a test pattern for you to check your graphics output.

echokey shows the decimal value of each key you press.

fonts draws tables showing each of the fonts used.

wpi & runs the graphical Wisconsin Package Interface.

wpi -small & runs the small screen version.

Return to Programmes organised by function

Comments? Questions? Accolades?
Please send them to David Featherston ( dwf@biobase.dk )

A Quick List of GCG and EGCG Programmes

Pattern Recognition and Compositional Analysis

RNA Secondary Structure (GCG only)

Display and Publications (GCG only)

Nucleotide Analysis (EGCG only)

A Quick List of GCG and
EGCG Programmes