Data files in this location represent variations detected in six horses
from six different breeds. The acronyms used in the file names refer to
the following horse breeds:
CH: Native Mongolian Chakouyi Horse
MM: Mangalarga Marchador Horse
AMH: American Miniature Horse
ARB: Arabian Horse
PER: Percheron Horse
TWH: Tennessee Walking Horse
o Annotation Files
Annotation files are pertaining to SNPs,INDELs, SVs, and CNV detected.
The SNPs and INDELs were annotated using SNPEff v4.0. The CNVs and SVs
were annotated by overlapping their corresponding breakpoints with
ENSEMBL genes using Bedtools (v2.23.0).
File CNVs+SV.Annotations.txt.zip contains the SVs and CNVs annotations
and can be easily viewd using a standard text editor such as Notepad++.
The types of SVs are SV_INTER for interchromosomal or SV_INTRA for
The file GATK_annotated_SNPs_Indels.vcf.gz contains SNPs and INDELs annotations
and should be viewd in a genome browser such as UCSC genome browser.
o CNV_track Files
CNVs and SVs detected using Control-FREEC. Copy number gains are displayed
in red. Copy number losses are displayed in blue. Normal copy numbers are
displayed in green.
Contains reformatted output of the structural variations detected with
SVDetect (REF: http://svdetect.sourceforge.net/Site/Home.html):
Bruno Zeitouni, Valentina Boeva, Isabelle Janoueix-Lerosey, Sophie Loeillet,
Patricia Legoix-n?, Alain Nicolas, Olivier Delattre, and Emmanuel Barillot
SVDetect: a tool to identify genomic structural variations from paired-end
and mate-pair sequencing data. Bioinformatics (2010) 26(15):1895-1896.
The input data for SVDetect were Illumina paired-end reads generated from an
Illumina Hiseq 2500 platform. The following are the values and interpretation
of the inter and intra chromosomal translocations given in the track:
NORMAL_SENSE: Correct ends orientation using < mates_orientation > as reference
REVERSE_SENSE: One of the ends has an incorrect orientation
DELETION: Deletion (NORMAL_SENSE & mean insert size > ?+threshold*sigma)
INSERTION: Insertion (NORMAL_SENSE & mean insert size < ?-threshold*sigma)
INVERSION: Inversion (REVERSE_SENSE)
INV_FRAGMT: Inversion of a genomic fragment, defined by balanced signatures (BAL)
INS_FRAGMT: Insertion of a genomic fragment, defined by balanced signatures (BAL)
INV_INS_FRAGMT: Inverted INS_FRAGMT (BAL)
LARGE_DUPLI: Large duplication
(mates orientation=FR/RF & reversed mate sense & mean insert size >
?+threshold*sigma & UNBAL) or
(mates_orientation=FF/RR & ends order=normal/inverted & mean insert size >
?+threshold*sigma & UNBAL)
DUPLICATION: Duplication, medium size
(mates_orientation=FR/RF & reversed mates orientation & mean insert size <
SMALL_DUPLI: Small duplication
(mean insert size < ?-threshold*sigma & overlap between subgroups)
INV_DUPLI: Inverted duplication
(REVERSE_SENSE & mean insert size < ?-threshold*sigma & UNBAL)
INV_TRANSLOC: Inverted translocation
COAMPLICON: Co-amplicons, two different fragments repeated in the same strand
sense (BAL), ex: A > B > , A > B > A > B >
INV_COAMPLICON: Inverted co-amplicons, two different fragments repeated in the
opposite strand sense (BAL), ex: A > B < , A > B < A > B <
SINGLETON: Singleton (mean insert size < ?-threshold*sigma), for Illumina mate-pairs only
UNDEFINED: Undefined inter/intra-chromosomal SV type
o VCF Files
SNPs and INDELs detected using GATK version 2.4-3 HaplotypeCaller procedure.
For users who wish to load data to their UCSC custom tracks, right-click on a
file name to get the file location URL to use in your UCSC web tools, so that
you do not need to download/upload files.
1. Mohammed Al Abri (firstname.lastname@example.org)
2. Samantha Brooks (email@example.com)