Structural biology
Structural biology has had an extraordinary impact on science. Beginning with the structure of DNA in 1953, thousands of biomolecular processes have been elucidated at atomic detail, contributing to many breakthroughs in healthcare. However, the field is still far from complete: the 3D structure for most human proteins has still not been experimentally determined, despite significant advances (especially in cryoEM1).
The BioVis Centre is developing strategies to improve how structural biology data is used. One strategy involves using high-throughput, machine learning to systematically model all proteins with unknown structure2. Other strategies involve using recent advances in data visualization3, computer graphics4, 3D printing5, augmented and virtual reality (AR and VR)6, commodity 3D interaction devices7, crowdsourced evaluation8,9, computer gaming10,11, as well as web technologies12,13.
The 'dark' proteome
To date, only a small fraction of the proteome has been experimentally observed using structural biology techniques, such as X-ray crystallography (corresponding to ‘PDB’ regions in the accompanying figure). However, by modelling these data with high-throughput machine learning1, we can predict the 3D configuration of about half of the proteome2 (‘gray regions’ in the figure). The remaining ‘dark’ proteome is a scientific mystery3.
At the BioVis Centre, we are systematically exploring the dark proteome using bioinformatics and data visualization4. We have found that much of the dark proteome has unexpected features that currently cannot be explained. In the human proteome, we found that dark proteins are surprisingly abundant across many tissues, indicating that they must perform important cellular functions5. We expect ongoing studies of the dark proteome are likely to reveal new insight into the molecular events underlying health and disease.
Systems biology
Systems biology focuses on how biomolecules coordinate to perform cellular functions. To date, over 4 billion biochemical reactions have been documented1 - however, much remains unknown. For example, most past studies were insensitive to the > 200 different types of protein post-translational modifications (PTM)2 - many critical for cellular function3. Some PTMs can now be measured at high-throughput4, producing in a wealth of data with potential to revolutionize biomedicine5. However, the complexity of these data overwhelms current analysis methods6.
At the BioVis Centre, we are developing novel analysis strategies addressing some of the more challenging datasets in systems biology. We are currently focusing on data from experiments that track dynamic changes in (typically) tens of thousands of cellular events (e.g., PTMs, gene expression). Our strategies use machine learning, statistical analysis, and visual analytics to create intuitive visualizations7-9 that can provide new insights into cellular processes (e.g., mitosis10 or insulin response11).
3D genomics
DNA sequencing technologies are rapidly advancing, and beginning to transform medical practice. However, this advance is held back by major gaps in our understanding of the genome – one of these gaps is our limited knowledge of how chromosomes are organised in 3D. Our understanding of this organisation is beginning to improve, thanks to exciting new experimental techniques that that can detect pairs of chromosomal regions within close spatial proximity1. However, these experiments generate billions of 3D contacts, and remain challenging to interpret with current analysis methods2.
At the BioVis Centre, we aim to make exploring 3D genomics data more effective and intuitive. To achieve this, we are using visual analytics3, statistical analysis, and crowdsourced evaluation4,5 to create an online tool ('Rondo') that helps reveal how 3D organisation relates to other genomic features (e.g., genes, promoters, epigenomic marks). This, in turn, will help researchers unravel the information encoded in DNA sequences and gain new insight into molecular processes underlying health and disease – such as genome rearrangements occurring in cancer6.