Start from the raw sequences, for example, file Input_raw_seqs.fastq 1. ./process_Rawseqs.pl Enzyme_code \ number_of_base_cut_at_the_end \ Barcode_file.txt \ Input_raw_seqs.fastq \ Output_selected_seqs.fastq Repeat the above step for all your raw sequence files. Note use only the barcodes you want to choose for each file. 2. ./BarcodeSplit.pl RAD Barcodes.txt *.fastq Note, all the results in fastq format after first step should be in the same folder and Barcodes.txt contains the codes for all samples. You will have files of RAD_barcode.fasta files for each sample. 3. ./HashSeqs.pl *.fasta You will have all *.hash files. 4. $ RepeatMasker -lib INRArepbase1.txt -pa 8 sample.hash You will need to have RepeatMasker installed and do this step for all the *.hash files. Note -pa 8 means I want to use 8 processors. You can change this number based on your computer's configuration. 5. ./list_unmasked.pl *.masked The *.masked files are generated by RepeeatMasker 6. ./select_hash.pl 200 test.fna *.unmasked Note we choose 200 for the upper limit of reads, but you can use any number. The *.unmasked files are generated from step 5. File test.fna is the output of this step. It cantains all the selected unmasked sequences. 7. ./re-hash.pl test test.fna test_rehash.fna test_rehash.dat Here test.fan is the one genarated from step 6. 8. ./novoindex \ test_rehash.idx \ test_rehash.fna 9. $ novoalign -r E 20 -t 250 -F FA -d test_rehash.idx \ -f test_rehash.fna > test_rehash.novo & 10. ./find_bi_allelic.pl test_rehash.novo test_biAllelic.txt 90 Note: '90' gives up to 3 mismatches between the two alleles, and 30 gives 1 mismatch. 11. ./call_snp.pl samples.txt \ test_rehash.dat \ test_biAllelic.txt \ test_genotype.txt \ test_genotype.fna 1 Note: the last input parameter means we want the output the loci that have 1 SNP, and if it is 3, then we output loci that have up to 3 SNPs. -- Gao, Guangtu Guangtu.Gao@ARS.USDA.GOV