|Dr Tudor Groza||Dr Simon Kocbek||Ahmed Muaz||Dr Frank Lin|
Dr Tudor Groza
Tel: + 61 (0)2 9355 5717
The KCCG Phenomics program aims to provide comprehensive solutions to enrich the understanding of the associations occurring between diseases, genotype, phenotype and environment via structured knowledge representation and discovery techniques.
Phenomics is the study of the collective physical and biochemical characteristics of an organism. Phenomics plays a crucial role in enabling genomic data to be interpreted and generating disease diagnoses.
For genomic research to advance and improve patient care, tight integration of phenomics and genomics is required. This will enable
- Interpretation and prioritization of the millions of variants present in each patient
- Comprehensive linkage of detailed phenotypic terms to genomic variants and diseases
Combining detailed phenotype profiles with clinical genomics data will drive progress in our understanding of the human genome and enable effective integration of genomics into the clinic, to support faster and more accurate diagnosis of rare and complex conditions.
The KCCG Phenomics Program applies natural language processing and ontology-driven techniques to recognise phenotypic information in electronic medical records (EMR), patient notes, case reports, and scientific and medical literature. This information is converted into machine-readable terminology, such as that based on the Human Phenotype Ontology (HPO), and enables automated processes for converting large volumes of unstructured text into new knowledge, phenotype analytics and visualisation tools for patients, clinicians and researchers.
Machine Readable Knowledge Representation
Computational phenotyping can only be achieved if the knowledge around diseases, medically-relevant phenotypes and medically-relevant risk factors is expressed in a machine-readable / interpretable representation. The team contributes to the efforts of creating and enriching ontologies modelling these domains as part of the global Monarch Initiative.
Automated Knowledge Acquisition and Analytic Tools
Most of today’s clinical data is stored in the form of free text notes or observations. We need to bridge the gap between this unstructured representation of the data and the machine-readable representation of the corresponding knowledge. The Phenomics team devises Natural Language Processing and Machine Learning mechanisms to extract meaningful concepts from free text data, to enable computational phenotyping and phenotype analytics.
Information extraction provides the means to create a channel between the now structured clinical data and the existing body of bio-medical knowledge. This, subsequently, supports various analytical tasks. The Phenomics team focuses on decision support methods to aid diagnosis, and hence uses phenotype-driven approaches to explore candidate disorders or to prioritise gene interpretation.
The analytical methods developed by the Phenomics team aim to support the decision making process by providing the clinician with exploratory options. Increased efficiency and usefulness of these options can only be achieved if they are presented in an intuitive and easy-to-use manner. The team considers the development of visualisation tools with a focus on representing complex knowledge in a user-friendly way as a critical component of the overall vision.
Product development and clinical applications
The software developed by the Phenomics Team accelerates translational and clinical applications of genomic technologies through harmonising phenomic information and the intelligent distillation of its informative content. It also enables phenotypic analyses to provide a translational bridge from genome-scale biology to a patient-centered view on human disease pathogenesis.
A clinical grade phenotype-oriented patient data management platform combining the richness of the Human Phenotype Ontology with highly intuitive user interfaces to aid the discovery and decision-making process in the context of clinical genomics. This platform enables deep computational phenotyping and collaboration by local and global patient data sharing (via the MatchMaker Exchange Initiative).
Patient Archive is the only platform that enables clinicians to use free text clinical notes for structured patient phenotyping, store the data in a secure manner (patient sensitive data is encrypted) and share the data via a fine-grained access control model. Furthermore, the platform provides support for intelligent analytics, focused on disease exploration, patient match-making and prescriptive phenotyping. The demo version of the latest release is available at http://patientarchive.org. Get in touch with us for a demo of the latest development version.
A unique solution for high precision phenotype concept recognition which enables a fast and accurate mapping of free text clinical data to Human Phenotype Ontology concepts. It is currently used in various applications, including Patient Archive, MyGene2 and the Monarch Initiative.
The underlying technique has been built to cater for the high lexical variability associated with clinical phenotypes, as well as for decomposing coordinated terms or detecting non-canonical phenotypes. The concept recognizer is part of the Patient Archive and serves other platforms, such as University of Washington’s MyGene2 (https://www.mygene2.org/MyGene2/#/about) or Baylor’s OMIM Explorer (http://omimexplorer.research.bcm.edu:3838/omim_explorer/). It has also been used to generate the first HPO annotation dataset for common diseases – available via our Pubmed Browser (http://pubmed-browser.human-phenotype-ontology.org/). The HPO CR package is freely available for academic use on request.
Coming soon: Journal manuscript annotator – for creating structured pheno-packets
An innovative platform for curating domain knowledge, focused on intuitive and friendly user interfaces and workflow-based knowledge curation and acquisition. An area of development aimed at improving the current knowledge curation workflows in the context of rare disease nomenclatures. An instance of such a platform is currently used by the Orphanet consortium to curate the editorial process and content of the Orphanet terminology.
The system will soon be available for public use in the context of the Orphanet data curation initiative.
- Groza T, Köhler S, Moldenhauer D, Vasilevsky N, Baynam G, Zemojtel T, Schriml LM, Kibbe WA, Schofield PN, Beck T, Vasant D, Brookes AJ, Zankl A, Washington NL, Mungall CJ, Lewis SE, Haendel MA, Parkinson H, Robinson PN. The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease. American Journal of Human Genetics, 97(1):111-24, 2015
- Oellrich A, Collier N, Groza T, Rebholz-Schuhmann D, Shah N, Bodenreider O, Boland MR, Georgiev I, Liu H, Livingston K, Luna A, Mallon AM, Manda P, Robinson PN, Rustici G, Simon M, Wang L, Winnenburg R, Dumontier M. The digital revolution in phenotyping. Brief Bioinform. 2015 Sep 29. pii: bbv083.
- Groza T, Köhler S, Doelken S, Collier N, Oellrich A, Smedley D, Couto FM, Baynam G, Zankl A, Robinson PN. Automatic concept recognition using the human phenotype ontology reference and test suite corpora. Database (Oxford). 2015 Feb 27;2015. pii: bav005.
- Mungall CJ, Washington NL, Nguyen-Xuan J, Condit C, Smedley D, Köhler S, Groza T, Shefchek K, Hochheiser H, Robinson PN, Lewis SE, Haendel MA. Use of model organism and disease databases to support matchmaking for human disease gene discovery. Hum Mutat. 2015 Oct;36(10):979-84.
- Baynam G, Walters M, Claes P, Kung S, LeSouef P, Dawkins H, Bellgard M, Girdea M, Brudno M, Robinson P, Zankl A, Groza T, Gillett D, Goldblatt J. Phenotyping: targeting genotype's rich cousin for diagnosis. J Paediatr Child Health. 2015 Apr;51(4):381-6.
- Groza T, Verspoor K. Assessing the impact of case sensitivity and term information gain on biomedical concept recognition. PLoS One. 2015 Mar 19;10(3):e0119091.
- Paul R, Groza T, Hunter J, Zankl A. Inferring characteristic phenotypes via class association rule mining in the bone dysplasia domain. J Biomed Inform. 2014 Apr;48:73-83.
- Collier N, Oellrich A, Groza T. Toward knowledge support for analysis and interpretation of complex traits. Genome Biol. 2013;14(9):214.
- Groza T, Hunter J, Zankl A. Mining skeletal phenotype descriptions from scientific literature. PLoS One. 2013;8(2):e55656.
JAX Labs, US
Oregon Health & Science University, US
Sanford Health, US
Berkeley Labs, US
Genomics England, UK
Charite Medical University Hospital, Berlin, Germany
Keio University, Japan
Database Centre for Life Science (DBCLS), Japan
Office of Population Health Genomics (OPHG) Perth, Western Australia
Genetic Services Western Australia
Sick Kids Hospital, Toronto, Canada