From jillian.maddoxalumni.unimelb.edu.au Tue Jan 17 18:03:12 2012
From: Jill Maddox <jillian.maddoxalumni.unimelb.edu.au>
To: Multiple Recipients of <crimap-usersanimalgenome.org>
Subject: Bash shell script for parsing chrompic files to show only phase
lines with highest recombinations
Date: Tue, 17 Jan 2012 18:03:12 -0600
Hi all
For those of you using Linux/Unix here is a shell script that might be of
use. It parses the output from chrompic and shows all lines from a chrompic
file that have more than a specified number of recombinations. The output
is sorted with the most recombinants at the bottom. Number of recombinants
and animal id are on the RHS. This enables you to scan through the file to
see whether double recombinants or phase changes line up - an indication of
wrong order. The position is the horizontal cursor position - corresponds
to the number in the output below the words "Sex_averaged map" or "Sex-
specific map". The minimum number of loci for the script to work is 10. The
script (between the ============ lines) needs to be saved as an executable
file (use chmod) somewhere in your path with a filename that you will
remember - I have it as chrompic_recomb_ord.sh. This assumes that you have
saved the chrompic
output to a file. The script also assumes that you have the gawk form
of awk in your path. If the machine you use doesn't have gawk then
either get it installed or try replacing gawk with nawk (if you have
that) or awk (please note that I haven't checked whether it works
with nawk or awk).
Usage: chrompic_recomb_ord.sh chrompic_filename min_number_of_recombinants
e.g. chrompic_recomb_ord.sh chr3.chrompic 3
========================================================================
#!/bin/bash
# chrompic_recomb_ord.sh
# sort lines containing more than recombinants and order from lowest
# to highest
# Input: chrompic filename, minimum number of recombinations
# Output: chrompic filename with .recomb suffix
if [ $# -ne 2 ]; then
echo 1>&2 Usage: $0 chrompic_filename min_num_recomb
exit 127
fi
if ! [ -a ./${1} ]; then
echo "file ${1} not found"
exit 127
fi
if ! [[ "$2" =~ ^[0-9]+$ ]]; then
echo "$2 is not a suitable number"
exit 127
fi
egrep " [-0o1i]{10} " $1 | sed 's/ \([0-9]*\)$/\t\1/g' | \
gawk -v var=$2 'BEGIN{FS=OFS="\t"; line1=0; minrec=var;} \
{if (line1 == 0) {line1 = 1; num_in_line=split($1, lineinfo, " "); \
ind = lineinfo[1]; \
if ($2 >= minrec){for (i=2; i <= num_in_line; i++) \
printf "%s", lineinfo[i];printf "\t%d\t%d\n", $2, ind;} \
else next;} \
else {line1 = 0; if ($2 >= minrec) \
{num_in_line=split($1, lineinfo, " "); \
for (i=2; i <= num_in_line; i++) printf "%s", lineinfo[i]; \
printf "\t%d\t%d\n", $2, ind;}}}' | \
sort -n -k2 -n -k3 > ${1}.recomb
exit
==================================================================================
The script could be improved on by also adding the family id. If someone
wants to do an updated version that includes the family then please e-mail
it to the users groups.
Regards
Jill
***************************************************************
Jill Maddox 16 Park Square Port Melbourne, 3207 Australia phone: 03 9646
0428 E-mail: jillian.maddoxalumni.unimelb.edu.au
***************************************************************
|