NAGRP VCF Data Repository
Data Repository
Discontinued
Login:
Your email:
Password: Forgot password?
HINT:
Use your AnGenMap Directory login. If you don't already have one, register for free.
Genomics analysis is the study of genetic variation and inheritance. The Variant Call Format (VCF) has become central to storing and organizing variant data from DNA sequence.

The purpose of the NAGRP VCF Data Repository is to facilitate the collaborative VCF file storage, handling, information abstraction, querying, and data re-use. Further access to the raw data may be authorized by the data owner.

Current data collection at a glance:

Species Number of files cattle : 41 chicken : 5 horse : 15 pig : 2 rainbow trout : 1 sheep : 2 Species Number of animals cattle : 42 chicken : 5 horse : 22 pig : 2 rainbow trout : 1 sheep : 6
Institutes Number of files Agricultural Sciences (CAAS) : 0 ARS, USDA : 2 Chunbuk National University : 1 Chungbuk National University : 1 Cornell University : 1 Guangxi University : 1 Iowa State University : 37 SOKENDAI : 1 School of Life : 1 Shanghai Jiao Tong : 1 USDA ARS National : 1 Univ. Vet. Med. Hannover : 3 University of California : 1 University of Copenhagen : 6 University of Minnesota : 2 University of Missouri : 1 University of Nebraska : 1 University of Nottingham : 1 University of Sydney : 1 Zoology Inst., CAS, China : 1 Species Reported SNPs cattle : 46,858,235 chicken : 4,835,439 horse : 19,137,078 pig : 0 rainbow trout : 0 sheep : 0
Contributing PI Number of files Akil Alshawi : 1 Carrie Finno : 1 Claire Wade : 1 Curt Van Tassell : 2 Danika Bannasch : 1 Ganqiu Lan : 1 Guiyan Ni : 1 Haile Berihulay : 1 He Meng : 1 Hideki Innan : 1 James Reecy : 34 Jessica Petersen : 1 Kim Kwan-Suk Kim : 1 Kwan-Suk Kim : 1 Li Rong : 1 Ludovic Orlando : 6 Meng-Hua Li : 1 Molly McCue : 1 Ottmar Distl : 3 Robert D. Schnabel : 1 Samantha Brooks : 1 Susan Lamont : 3 Yniv Palti : 1 VCF generator Number of files Axiom : 1 BWA/SAMtools/Freebayes : 1 GATK : 2 Golden : 1 NextGENe : 1 Plink : 1 Ponytools : 1 SAMTools : 1 SNP : 1 Unknown : 20 Vcftools : 1 bwa/GATK : 30 bwa/gatk : 3 gemostudio : 1 vcftool : 1

Features:

This platform provides pre-computed services to facilitate users to abstract needed information before further heavy duty computing analysis may be performed.

  • Use vcftools to pre-compute some basic statistics (ref)
    • Simple statistics such as counts of each SNPs, homozygotes, heterozygotes, etc. (vcf-stats to jason format).
    • Estimation of allele frequencies
    • Merging multiple VCF files for combined analysis
  • Build "projects" from multiple VCF files with PLINK/SEQ tools for combined analysis (if 1 animal per VCF file, this is to bring multiple animals together).
    • Basic statistics (e.g. v-stats, i-stats, and g-stats) (ref)
    • Statistics across all variants (e.g. non-reference genotypes, number of genotypes with a minor allele, number of heterozygous genotypes for an individual, total number of called variants for an individual, genotyping rate for an individual, etc.)
    • More sample outputs
  • Convert VCF files to BED format to utilize BEDOPS tools for data abstraction and analysis (ref).
    • Set operations: extract features, match features, etc.
    • Statistics: common statistical operations by mapping overlapping features, merging files, etc.
    • File management: file starch for lexicographical speed sorting, data extraction, conversion, etc.
  • Pre-compute some summary statistics, including but not limited to:
    • Filtering SNPs based on sequencing depth and frequencies
    • Estimating allele frequencies among multiple animals
    • Counting the number of homozygotes, heterozygotes, and phased/unphased SNPs (vcftools)

To participate:

Follow these simple steps to begin using the data repository:

  • Step 1: Login;
  • Step 2: Use the web form to submit your VCF file meta information;
  • Step 3: Upload your VCF file;
  • Step 4: Access pre-computed data summary statistics.

Outlook:

We will seek out and work with any public platforms that may complement our efforts, for example, NCBI Variation Database and Ensembl Variation Database.

Web Access Statistics © 2003-2024 USA - USDA - NRSP8 Program for Applied Bioinformatics.
Contact: Bioinformatics Team
Helpdesk