A data resource of 838 porcine microsatellite sequences with repeat motives of three to six bases

Peter Karlskov-Mortensen 1, Zhi-Liang Hu 2, Jan Gorodkin 1, James Reecy 2 and Merete Fredholm 1

1: Department of Animal and Veterinary Basic Sciences, Division of Genetics. The Royal Veterinary and Agricultural University, Frederiksberg, Denmark.

2: Department of Animal Science, Center for Integrated Animal Genomics, Iowa State University, 2255 Kildee Hall, Ames, Iowa 50011-3150, USA

(Manuscript submitted)

TEXT: We recently reported on the identification of more than 10000 dinucleotide repeats in the porcine genome(1) based on an analysis of sequence data from the Sino-Danish Pig Genome Sequencing Consortium providing a 0.66 X coverage of the porcine genome (3). Here we present 838 perfect simple sequence repeats (SSR) with repeat motives of three to six bases identified in the same sequence data by a an approach similar to the one used in our previous analysis. A putative position in the porcine genome has been assigned to 296 SSR's based on pig-human comparative mapping information.

Until now, most of the microsatellite markers available in the porcine genome have been dinucleotide repeat sequences. However, microsatellites with longer repeat motives may be preferred for use as genetic markers because problems with stutter bands are minimized and alleles are more easily and unambiguously scored. A PERL script was designed to identify any perfect tri-, tetra-, penta- or hexanucleotide repeat with at least 10 repeats and determine the position of the repeat in a given sequence. Mismatches and indels in the microsatellite repeats were not allowed for since imperfect repeats are less useful as genetic markers because of lack of polymorphism. The script was used to analyze approximately 4 million genomic shot gun reads generated from five different breeds of Sus scrofa: ErHuaLian, Duroc, Landrace, Yorkshire and Hampshire(3). Sequences with microsatellite repeats were queried against each other by BLAST to exclude SSR's from SINE or LINE segments from the dataset. In total, 838 unique sequences with SSR's were identified. Of these, 190 (22.7 %) were trinucelotide repeats, 584 (69.7 %) were tetranucleotide repeats, 57 (6.8 %) were pentanucleotide repeats and 7 (0.8 %) were hexanucleotide repeats. All unique sequences with SSR's were queried against the human genome (NCBI Build 36.2, reference sequence) using BLAST, and sequences with a single unique match (E < 1x10-5) were assigned a putative position in the porcine genome based on information from physically anchored pig-human comparative maps 2. All sequences with a putative position in the porcine genome are put on the maps built in PigQTLdb (http://www.animalgenome.org/QTLdb/pig.html) and are publicly available from there. The complete set of 838 sequences in fasta format is available as a single file at http://piggenome.dk/ms. In sixteen sequences the identified SSR was found immediately adjacent to another shorter repeat hereby constituting a compound microsatellite. Earlier studies have shown that microsatellites of this type may be less polymorphic than perfect microsatellites and therefore less useful as genetic markers(4). They are, however, not excluded from the dataset but their special status is noted in the database entry at http://www.animalgenome.org/QTLdb/pig.html. Genebank accession numbers: EU010405 to EU011243.

Reference:

  1. Karlskov-Mortensen P. et al. (2007) Anim Genet , e-pub ahead of print
  2. Meyers S. N. et al. (2005) Genomics 86, 739-52
  3. Wernersson R. et al. (2005) Bmc Genomics 6
  4. Wintero A. K. et al. (1992) Genomics 12, 281-8


Web Access Statistics © 2003-2012 NAGRP - Bioinformatics Coordination Program.
Contact: NAGRP Bioinformatics Team
::Helpdesk::