Supplementary Data
Data Repository


Mohammed Al Abri and Samantha Brooks

Whole Genome Detection of Sequence and Structural Polymorphisms in Six Diverse Horses

BioRxiv, doi:

 Annotations GATK SNPs Indels.vcf.gz755.61 MB2016-01-12 12:34:41
 Annotations GATK SNPs Indels.vcf.gz.tbi1.62 MB2016-01-12 12:35:15
 Annotations SV+CNV.txt.gz419.82 KB2017-07-25 15:16:53
 CNV tracks AMH Bed.zip5.61 MB2017-08-07 00:49:08
 CNV tracks ARB Bed.zip6.68 MB2017-08-07 00:49:24
 CNV tracks CH Bed.zip4.88 MB2017-08-07 00:50:06
 CNV tracks MM Bed.zip5.26 MB2017-08-07 00:49:50
 CNV tracks PER Bed.zip7.47 MB2017-08-07 00:49:38
 CNV tracks TWH Bed.zip6.46 MB2017-08-07 00:48:54
 SV tracks Inter KB2015-07-21 12:40:03
 VCF SNPs Indels combined.vcf.gz708.98 MB2015-06-06 17:56:46
 VCF SNPs Indels combined.vcf.gz.tbi1.56 MB2015-06-06 18:14:01

Contact: Mohammed Al Abri ( 
         Samantha Brooks (

File Descriptions

Data files in this location represent variations detected in six horses 
from six different breeds.  The acronyms used in the file names refer to 
the following horse breeds:

    CH: Native Mongolian Chakouyi Horse
    MM: Mangalarga Marchador Horse
   AMH: American Miniature Horse
   ARB: Arabian Horse
   PER: Percheron Horse
   TWH: Tennessee Walking Horse 

o Annotation Files
  Annotation files are pertaining to SNPs,INDELs, SVs, and CNV detected. 
  The SNPs and INDELs were annotated using SNPEff v4.0.  The CNVs and SVs
  were annotated by  overlapping their corresponding breakpoints with
  ENSEMBL genes using Bedtools (v2.23.0).

  File contains the SVs and CNVs annotations
  and can be easily viewd using a standard text editor such as Notepad++.
  The types of SVs are SV_INTER for interchromosomal or SV_INTRA for
  intrachromosomal translocations.

  The file GATK_annotated_SNPs_Indels.vcf.gz contains SNPs and INDELs annotations
  and should be viewd in a genome browser such as UCSC genome browser.

o CNV_track Files
  CNVs and SVs detected using Control-FREEC.  Copy number gains are displayed
  in red.  Copy number losses are displayed in blue.  Normal copy numbers are
  displayed in green.

  Contains reformatted output of the structural variations detected with
  SVDetect (REF:

   Bruno Zeitouni, Valentina Boeva, Isabelle Janoueix-Lerosey, Sophie Loeillet,
   Patricia Legoix-n?, Alain Nicolas, Olivier Delattre, and Emmanuel Barillot 
   SVDetect: a tool to identify genomic structural variations from paired-end
   and mate-pair sequencing data. Bioinformatics (2010) 26(15):1895-1896.

  The input data for SVDetect were Illumina paired-end reads generated from an 
  Illumina Hiseq 2500 platform.  The following are the values and interpretation 
  of the inter and intra chromosomal translocations given in the track:
  NORMAL_SENSE: Correct ends orientation using < mates_orientation > as reference
  REVERSE_SENSE: One of the ends has an incorrect orientation
  DELETION: Deletion (NORMAL_SENSE & mean insert size > ?+threshold*sigma)
  INSERTION: Insertion (NORMAL_SENSE & mean insert size < ?-threshold*sigma)
  INV_FRAGMT: Inversion of a genomic fragment, defined by balanced signatures (BAL)
  INS_FRAGMT: Insertion of a genomic fragment, defined by balanced signatures (BAL)
  LARGE_DUPLI: Large duplication
    (mates orientation=FR/RF & reversed mate sense & mean insert size > 
     ?+threshold*sigma & UNBAL) or
    (mates_orientation=FF/RR & ends order=normal/inverted & mean insert size > 
     ?+threshold*sigma & UNBAL)
  DUPLICATION: Duplication, medium size
    (mates_orientation=FR/RF & reversed mates orientation & mean insert size < 
     ?-threshold*sigma )
  SMALL_DUPLI: Small duplication
    (mean insert size < ?-threshold*sigma & overlap between subgroups)
  INV_DUPLI: Inverted duplication
    (REVERSE_SENSE & mean insert size < ?-threshold*sigma & UNBAL)
  TRANSLOC: Translocation
  INV_TRANSLOC: Inverted translocation
  COAMPLICON: Co-amplicons, two different fragments repeated in the same strand
    sense (BAL), ex: A > B > , A > B > A > B > 
  INV_COAMPLICON: Inverted co-amplicons, two different fragments repeated in the
    opposite strand sense (BAL), ex: A > B < , A > B < A > B < 
  SINGLETON: Singleton (mean insert size < ?-threshold*sigma), for Illumina mate-pairs only
  UNDEFINED: Undefined inter/intra-chromosomal SV type

o VCF Files
  SNPs and INDELs detected using GATK version 2.4-3 HaplotypeCaller procedure.

For users who wish to load data to their UCSC custom tracks, right-click on a
file name to get the file location URL to use in your UCSC web tools, so that 
you do not need to download/upload files.


1. Mohammed Al Abri ( 
2. Samantha Brooks (

Web Access Statistics © 2003-2019 NAGRP - Bioinformatics Coordination Program.
Contact: NAGRP Bioinformatics Team