README.TXT OASES SOURCE Feb 1, 2010 Daniel Zerbino (dzerbino@soe.ucsc.edu) Marcel Schulz (marcel.schulz@molgen.mpg.de) > SUMMARY * A/ REQUIREMENTS * B/ COMPILING INSTRUCTIONS * C/ RUNNING * D/ OUTPUT FILES * E/ OPTIONS ---------------------------------------------------------------------------------- A/ REQUIREMENTS Oases should function on any standard 64bit Linx environment with gcc. A good amount of physical memory (12GB to start with, more is no luxury) is recommended. Before trying to compile Oases, you must install the Velvet package: www.ebi.ac.uk/~zerbino/velvet/ . Keep note of the directory in which you install Velvet. ---------------------------------------------------------------------------------- B/ COMPILING INSTRUCTIONS Normally, with a GNU environment, just type: > make 'VELVET_DIR=/path/to/velvet' Note that you need to communicate all the Velvet compilation settings during the Oases compilation. Therefore, if you want to make a debugging colorspace version of Oases with a maximum kmer length of 63 and 5 short-read libraries, the commandline becomes: > make colordebug 'VELVET_DIR=/path/to/velvet' 'MAXKMERLENGTH=63'\ 'CATEGORIES=5' ---------------------------------------------------------------------------------- C/ RUNNING You must first process the reads using Velvet: * you must choose a hash length at this stage (cf. the Velvet manual), * DO NOT set a coverage cutoff, you should set that when running oases, * DO NOT set an expected coverage, * remember to turn on the -read_trkg option when running velvetg. As an example: > velveth new_directory 21 -shortPaired data/test_reads.fa > velvetg new_directory -read_trkg yes You can now run Oases on the Velvet working directory which has just been created. Provide all the information about insert lengths and their standard deviation as possible (identical to Velvet): > oases new_directory -ins_length 200 ---------------------------------------------------------------------------------- D/ OUTPUT FILES Oases produces a number of output files, which correspond to the different algorithms being run succesively on the data. In the above example, you would find: new_directory/transcripts.fa A FASTA file containing the transcripts imputed directly from trivial clusters of contigs (loci with less than two transcripts and Confidence Values = 1) and the highly expressed transcripts imputed by dynamic programming (loci with more than 2 transcripts and Confidence Values <1). new_directory/splicing_events.txt A hybrid file which describes the contigs contained in each locus in FASTA format, interlaced with one line descriptions of splicing events using the AStalavista nomenclature*. new_directory/contig-ordering.txt A hybrid file which describes the contigs contained in each locus in FASTA format, interlaced with one line summaries of the transcripts listed in transcripts.fa . Each line is a string of atoms defined as: $contig_id:$cumulative_length-($distance_to_next_contig)-> Here the cumulative length is the total length of the transcript assembly from its 5' end to the 3' end of that contig. This allows you to locate the contig sequence within the transcript sequence. A file describing the transcripts imputed directly from trivial clusters. Its format is identical to the file described previously. * Sammeth, Michael, Foissac, Sylvain Guigó, Roderic, 'A General Definition and Nomenclature for Alternative Splicing Events', PLoS Comput Biol , vol. 4, no. 8, e100014y+ (2008). ---------------------------------------------------------------------------------- E/ OPTIONS The behavior of Oases can be modified using the following options: -min_trans_length simple threshold on output transcript length -cov_cutoff minimum number of times a k-mer has to be observed to be used in the assembly (just like in Velvet) [default=3] -min_pair_cov minimum number of times two contigs must be connected by reads or read pairs to be clustered together [default=4] -paired_cutoff minimum ratio between the numbers of observed and expected connecting read pairs between two contigs [default=0.1] -scaffolding allows you to prevent the creation of gapped transcripts E.g.: > oases new_directory -ins_length 200 -cov_cutoff 3 -min_pair_count 4 Running: > oases --help will produce a short help message