PAG-XV workshop
From Infopedia
QTL Database Round Table Discussion Workshop Summary
A QTL Database Round Table Discussion workshop was held in Towne Room, Town & Country Hotel, San Diego (on the PAG meeting) on January 16th., 2007. The meeting was co-organized by James Reecy of Iowa State University, the NRSP8 Bioinformatics Coordinator, and David Burt of Roslin Institute, UK. The meeting was aimed to set the stage for developing a minimal information required for publishing QTL and phenotype ontology development.
There are about 20+ people participated the discussion. Below is an incomplete list participants: Alan Archibald, Andy Law, David Burt, Jan Aerts, Wilfrid Carre (from Roslin Institute, ATKdb), Mary Shimoyama (from Medical College of Wisconsin, RGD), Susan Bello (from Jaxon Lab, MGD), David Adelson (from Texas A&M), Kim Pruitt (from NCBI), Eric Fritz, Laron Hughes, Susan Lamont, Max Rothschild, James Reecy and Zhiliang Hu (from Iowa State University, AnimalQTLdb).
- Here are the slides from this workshop.
- Below is a brief meeting notes by Eric et al.:
The meeting was consisted of two parts: a discussion on Minimum Information required for QTL and Associstion Studies (MIQAS) and a discussion on developing Animal Trait Ontology (ATO).
MIQAS Discussions
Zhiliang Hu presentated on the experience and lessons from developing the Animal QTLdb. His discussion included
- Parameters to describe the QTL
- Sensibly
- Reliably
- Distinctly
- Things to be considered for the database
- Animals
- Traits
- Tests
- Maps
- Minimum requirements to enter
- Trait description
- At least 1 map location
- At least 1 statistic parameter
- Animal description
- Experiment description
- Problems with published papers
- Hard to extract data
- Mixed information with QTL/associations
- Varying results
- Need a standardized template
- Minimum information required
- Recommended criteria
- Make better use of the information for databases
Mary Shimoyama from presented how Rat Genome Database was setup at the MCW:
- Can query the database by using
- Trait
- Strains
- Chromosome
- LOD Score
- p-value
- Reference
- Minimum information required
- Peak marker
- Phenotype
- LOD score/p-value
- Measurement method
- Strains & crosses information
- Nomenclature use
- Use the author’s nomenclature when possible
- Standardize and number to existing traits
- Avoid gene symbols
- Don’t use disease terms
- Separate QTLs, each experiment is entered as a new QTL
- Standardizing with the use of 2 ontologies
- MPO
- Disease Ontology
- Other information about the database
- Links to browsers
- Some information on other details are in text form
- Various phenotypes of a strain
- Candidate gene relationships
- Trait table – traits with subtraits
- Conditions with traits & method of measurement
- Classified by author interpretation
- These are often no intuitive
- Don’t follow a logical pattern/hierarchy
- Issues and New Directions
- Lack of Consistent standards
- Mapping Issues
- QTL with only peak marker
- SSLPs are not mapped
- SSLPs with lost mapping status
- Traits
- Logical
- Hierarchical
- Cross species
- Difficulties to use PATO for the RGD trait data.
- How PATO people would define blood pressure in with PATO concept.
Her discussion raised a nomenclature issue (suggest author’s input).
Susan Bello presented how Mouse Genome Database was setup:
- MP ontology problems
- What ID strategy/format should be used
- No map -> want to fix
- All information of QTL crammed into a paragraph at the bottom
- MP ID -> Links to ontology – description of trait plus a modifier
- Mamalian Trait Ontology
- Reduce and standardize
- Terms – get good definitions for traits
- All variants on 1 page for easier comparison
- Relationship types with candidate genes
Her presentation brought about some interesting points for discussion:
- Unique parameters:
- Allele information
- Animal strain information
- Comparative phenotypes
- Phenotype models
- Display of parental information on a QTL – pedigree information housed
- Good lessons learnt from using PATO to annotate mouse traits
- Complex situation considering how traits are formed from phenotypes (linked more closely to underlining genes)
- Seems necessary to separate trait from phenotype ontology
- How to develope trait and phenotype ontology integratively or separately.
David Adelson presented their works on a Cattle QTL Viewer
- Create a database that was searchable for QTL information
- Search for QTL and display the information
- Marker data & QTL data are separate so you don’t have to change everything on a new update
- Use of a maximum of 10 shades of color on the webpage
- Non-selective – if it gets published, it gets put in
- Don’t need details, users can read the paper if needed
- Papers on the same QTL linked as references
- Gbrowse browser used but no links to candidate genes
- Limiations
- Quality of Assembly
- Search tool using smart string/text matchs
- Data formats and interoperability
- QTL data to database upon acceptance into journal
- Each marker tested should be required
- Most data SNP based rather than microsatellite based
- Each database is currated differently and should be interoperable via web services
Kim Pruitt presented their works at NCBI developing the dbGaP
- Two levels of access
- Open – general information
- Controlled – pedigree information
- Study report/Variable report
- Both have a unique stable identifier
- Genome Browser – zoom function allows for more information
- Table scrolls – all are linked
- Retrieves data from map viewer database
One unique feature of dbGaP is that it has pedigree information. However descriptions at trait and phenotype levels seems needed.
Jan Aerts presented his thoughts on Minimal Standards for QTL/Association Studies
- QTL handling
- Get information
- Narrow down QTL regions
- Information needed to be in the paper as free text or printed tables
- Existing or custom map used
- Positions of markers
- Confidence interval
- Need a set of rules for minimal information
- Need a file format for easy synchronization of databases/data exchange
- MIQAS
- QTL software -> export function
- Tool -> walks authors through data collection
- Submit to “MIQAS-compliant” database
- Support by the QTL Community
- Low threshold values to use
- Encouragement by journals
- Issues/Possibilities
- Legacy data
- Centrally assigned accession numbers
- Central repository maps and populations
- Causative genes within QTL
- Status
- Discuss
- Finalize format (yaml format)
- Involve QTL
Alan Archibald presented his perspective on developing a standard for QTL publications as an ex-Journal Editor
- Common Interests
- Editors have a tabloid-esk view
- Greater regulation needed on editing but not extreme
- Tools to run checks – quality blocks
- Does it exist and is it what the author says it is
- Problems with the types of databases
- The information is currated
- The information is just dumped in
Open Discussion The meeting then entered an open discussion on Minimal Standards
- Need something better than user beware
- databases run out of something like NCBI are more stable
- Stability of a database: long last or not? How long?
- Worries about the future maintenance of such a database: how nearly a realistic it will be, to be accepted by the World community for now and lasting for future?
- Journals not being rigorous enough
- Checks on data liability – more acceptable
- Database existence and stability an issue
- Need same formatted data for easy transference
- Much need for a structure/format
- Vision for data to be integratable with other databases.
- User doesn’t understand the difference between curration and just dumping the data in (whose responsibility to solve this problem? Or a mechanism to flag the nature of such data?)
- Actions to be taken
- Take notes on the discussions
- Send out a 1st draft of minimal standards to be reviewed
- Get an agreed upon product
- Send to a journal
- Eventually get the plants involved because this is not constrained by organism
- Approach with a working document
- Contact software developers for involvement
- Founders from industry and its limitations
- Get industry comments on this idea
ATO Discussions
Zhiliang Hu presented their works on developing ATO as part of their efforts on development of QTLdb.
- Trait measurement variations
- Databases used for future use of data, not just storage of data
- Challenges
- Best way to classify
- Best way to organize
- Best way to manage
- Ontology is an agreement of vocabulary
- Data Management
- ATO
- COB
- Traits to phenotypes
- May not all agree, but a tool with which to merge items
- Pointed out a need that a community effort is needed.
LaRon Hughes presented an overview of ATO and PATO Concepts
- Ontology used to communicate information effectively
- Centralize phenotype information for collaboration
- Examples
- Protégé
- OBO-edit
- Integrate QTL with ontology information
- A means to advance science by allowing collaboration
- PATO
- Compositionality
- Pre-composition
- Post-compostion
- building blocks
- Entities
- Qualities
- Compositionality
- Pre composition – end user understands terminology
- Post composition – more comprehensive search
- Downside – Envisioned for simple organisms
- Gene oriented
- Vertebrates are too complicated
- Morphological traits lend to PATO
- Physiological
- Behavioral
- Don’t expose end user to query and data manipulation and interworkings
- Work current databases into PATO ie. something in common
Wilfrid Carre presented their works on developing Chicken QTL Phenotype Ontology
- Mammalian Phenotype Ontology (MGI website) not suitable for production traits.
- Defined a 3 level Ontology based on the different publications
- Interactions between traits / Traits belonging to different group
- Trait given at a specific age or stage
- Environmental conditions
- Different way to look at and to classify traits
- Compared Rospin way and ATO way of organising traits
- other databases oriented on trait
- OMIM, for human
- MGI for mouse
- OMIA contains traits from 135 animal species
- dbGaP has a phenotype component as well
- Conclusion remarks:
- How to make the connection between different traits?
- How to integrate OMIA and OMIM data to the trait ontology?
- In order to do some comparative mapping necessity to have a similar ontology from one species to an other one)
- Common trait should be in the same super classe and have almost the same subcategories even if they reflect different production in different species
- Necessity to standardize abbreviations of traits
Open Discussion, Where to go
- Move towards generalized phenotypes
- Some authors are good at submission and some aren’t
- How to draw a line between phenotype and method
- Searchable, multiple ontologies
- Use of modifiers in ontologies (Andy Lay)
- More orthogonal interface to enable more powerful search capabilities
- Hide “ontology” from the editor interface
- Isolate the ontology into an internal “compartment”
- Decompose everything down (PATO) vs partial decomposition of traits
- Be able to step away from the data
- Properties of what you are trying to represent
- Ways to move forward?
- Interopterability
- Object oriented mapping to databases
- The process
- Come up with elements that are needed for QTLs
- Get a framework – inter-working vocabulary
- Words are species specific but have the same structure used in all species
- Common Categories
- Mouse and Rat
- First to go at integrating
- Don’t aim for a single ontology
- Define objects and map later
- Agree on qualities needed for this (mouse and rate working together)
- Agreement to map across species with some sample data format usage
- Use of webservices to implement certain goals
- Central access (short term storage at NCBI?)
- Problems with integration
- Use a pre-assignment of identifier space – each database has it’s own chunk
- Authors submit to only 1 database (the database of their species)
- Legacy of data to avoid duplication
Continued discussions tentatively planned for September
- By that time have agreed upon minimal standards
- Be able to expand beyond the current working group
- A manuscript will be prepared and discussed, at the next Genome Informatics meeting (September, 2007); schedule to publish before the end of the year.
--ZhiliangHu 10:28, 1 February 2007 (CST)
