Cap3 is a fragment assembly program written by Xiaoqiu Huang <email@example.com>.
The quick-and-dirty on how to use it is:
(1) login to the Unix computer on which it is installed.
(2) combine all of your sequence fragments into a single fasta file
cap3 frag.file > cap3.log
You will then have as output, a file of contigs (ending in .contigs) and a file of unused fragments (ending in .singlets). The file cap3.log tells you about the details of how and why the contigs were selected.
The following is taken directly from the cap3 README file written by the author.
A detailed documentation on CAP3 usage.
Usage: cap3 File_of_reads [options]
File_of_reads is a file of DNA reads in FASTA format
If the file of reads is named 'xyz', then
the file of quality values must be named 'xyz.qual',
and the file of constraints named 'xyz.con'.
Options (default values):
-a N specify band expansion size N > 10 (20)
-b N specify base quality cutoff for differences N > 15 (20)
-c N specify base quality cutoff for clipping N > 5 (10)
-d N specify max qscore sum at differences N > 100 (250)
-e N specify extra number of differences N > 10 (20)
-g N specify gap penalty factor N > 0 (6)
-m N specify match score factor N > 0 (2)
-n N specify mismatch score factor N < 0 (-5)
-o N specify overlap length cutoff > 20 (30)
-p N specify overlap percent identity cutoff N > 65 (75)
-s N specify overlap similarity score cutoff N > 100 (500)
-u N specify min number of constraints for correction N > 0 (4)
-v N specify min number of constraints for linking N > 0 (2)
-x N specify prefix string for output file names (cap)
If no quality file is given, then a default quality value of 10 is used for each base.
CAP3 takes as input a file of sequence reads in FASTA format. If the names of
reads contain a dot ('.'), CAP3 requires that the names of reads sequenced from
the same subclone contain the same substring up to the first dot. CAP3 takes two
optional files: a file of quality values in FASTA format and a file of
The file of quality values must be named "xyz.qual", and the file of forward-reverse constraints must be named "xyz.con", where "xyz" is the name of the sequence file. CAP3 uses the same format of a quality file as Phrap.
Each line of the constraint file specifies one forward-reverse constraint of the form:
ReadA ReadB MinDistance MaxDistance
where ReadA and ReadB are names of two reads, and MinDistance and MaxDistance are distances (integers) in base pairs. The constraint is satisfied if ReadA in forward orientation occurs in a contig before ReadB in reverse orientation, or ReadB in forward orientation occurs in a contig before ReadA in reverse orientation, and their distance is between MinDistance and MaxDistance. CAP3 works better if a lot more constraints are used.
We have a separate program named "formcon" to generate a constraint file from the sequence file. The program takes an input file of fragments in FASTA format and two integers (minimum distance and maximum distance in bp). The minimum distance and maximum distances specify a lower and a upper limit on the subclone length, respectively. It produces a file of forward-reverse constraints for CAP3. It is assumed that a pair of forward and reverse reads must contain a dot in their names and a pair of forward and reverse reads have a common name up to the first dot. Because CAP3 uses reads whose ends are clipped, instead of raw reads, to measure their distance, the distance seen by CAP3 could be different from the insert size by 1000 to 1500 bp. For example, if the insert size is 2000 to 3000 bp, we recommend that you use 500 for the minimum distance and 4000 for the maximum distance. The results are in the file with name ending in ".con".
The complete help files for cap3, with more details, are located on genome.chmcc.org in the file /usr/local/gcg/doc/cap3.help