CRI-MAP Users Forum Posted mail
From jillian.maddoxalumni.unimelb.edu.au  Tue Jan 17 18:03:12 2012
From: Jill Maddox <jillian.maddoxalumni.unimelb.edu.au>
To: Multiple Recipients of <crimap-usersanimalgenome.org>
Subject: Bash shell script for parsing chrompic files to show only phase
  lines with highest recombinations
Date: Tue, 17 Jan 2012 18:03:12 -0600

Hi all

For those of you using Linux/Unix here is a shell script that might be of
use. It parses the output from chrompic and shows all lines from a chrompic
file that have more than a specified number of recombinations. The output
is sorted with the most recombinants at the bottom. Number of recombinants
and animal id are on the RHS. This enables you to scan through the file to
see whether double recombinants or phase changes line up - an indication of
wrong order. The position is the horizontal cursor position - corresponds
to the number in the output below the words "Sex_averaged map" or "Sex-
specific map". The minimum number of loci for the script to work is 10. The
script (between the ============ lines) needs to be saved as an executable
file (use chmod) somewhere in your path with a filename that you will
remember - I have it as chrompic_recomb_ord.sh. This assumes that you have
saved the chrompic
output to a file. The script also assumes that you have the gawk form  
of awk in your path. If the machine you use doesn't have gawk then  
either get it installed or try replacing gawk with nawk (if you have  
that) or awk (please note that I haven't checked whether it works  
with nawk or awk). 

Usage: chrompic_recomb_ord.sh chrompic_filename min_number_of_recombinants
e.g. chrompic_recomb_ord.sh chr3.chrompic 3

======================================================================== 
#!/bin/bash 
# chrompic_recomb_ord.sh 
# sort lines containing more than recombinants and order from lowest 
# to highest 
# Input: chrompic filename, minimum number of recombinations 
# Output: chrompic filename with .recomb suffix 
if [ $# -ne 2 ]; then 
     echo 1>&2 Usage: $0 chrompic_filename min_num_recomb 
     exit 127 
fi 
if ! [ -a ./${1} ]; then 
    echo "file ${1} not found" 
    exit 127 
fi 
if ! [[ "$2" =~ ^[0-9]+$ ]]; then 
     echo "$2 is not a suitable number" 
     exit 127 
fi 

egrep " [-0o1i]{10} " $1 | sed 's/ \([0-9]*\)$/\t\1/g' | \
  gawk -v var=$2 'BEGIN{FS=OFS="\t"; line1=0; minrec=var;} \
  {if (line1 == 0) {line1 = 1; num_in_line=split($1, lineinfo, " "); \
        ind = lineinfo[1]; \ 
     if ($2 >= minrec){for (i=2; i <= num_in_line; i++) \ 
         printf "%s", lineinfo[i];printf "\t%d\t%d\n", $2, ind;} \ 
     else next;} \ 
  else {line1 = 0; if ($2 >= minrec) \ 
    {num_in_line=split($1, lineinfo, " ");  \ 
    for (i=2; i <= num_in_line; i++) printf "%s", lineinfo[i]; \ 
    printf "\t%d\t%d\n", $2, ind;}}}' | \ 
sort -n -k2 -n -k3 > ${1}.recomb 
exit 
================================================================================== 

The script could be improved on by also adding the family id. If someone
wants to do an updated version that includes the family then please e-mail
it to the users groups.

 
Regards 

 
Jill 

 
*************************************************************** 

Jill Maddox 16 Park Square Port Melbourne, 3207 Australia phone: 03 9646
0428 E-mail: jillian.maddoxalumni.unimelb.edu.au

*************************************************************** 


 

 

© 2003-2024: USA · USDA · NRPSP8 · Program to Accelerate Animal Genomics Applications. Contact: Bioinformatics Team