Skip to main content
24 Oct 2018

What healthy looks like: Thousands-strong DNA database nears completion

The Medical Genome Reference Bank, a database of the genetic information of 4000 healthy elderly Australians, is nearing completion.

In a research paper published today in the European Journal of Human Genetics, the team behind the Medical Genome Reference Bank (MGRB) have reported on the database, its design and its progress towards completion.

The MGRB will act as an important resource for researchers and clinicians, helping them to understand which genetic changes are likely to be important in disease.

A world-class resource

The largest resource of its kind in the world, the Medical Genome Reference Bank (MGRB) is the first database worldwide to gather the entire genetic sequence of thousands of healthy older people. It is also Australia’s largest single genome cohort: our country’s biggest single collection of entire human genomes.

Launched in 2012 with 1200 genomes, the MGRB now contains the entire DNA sequence (whole genome) of 4000 Australians over the age of 70. Every individual included in the MGRB is healthy. For the purposes of the study, ‘healthy’ means that every individual has been free of three major classes of disease (cardiovascular disease, dementia and cancer) for seven decades or more.

DNA sequencing is now complete for all 4000 genomes in the MGRB. The research team today reports that three-quarters of the 4000 samples have been fully analysed, and the remaining 1000 sequences will be analysed before the end of 2019.

Already, researchers and clinicians in Australia and around the world are using the MGRB to accelerate medical research and diagnose disease. Through a carefully designed tiered access model of data sharing and management (see FAQs below), the MGRB maintains participant privacy and confidentiality, while maximising research and clinical usage of the database.

What does healthy look like?

Most genomic cohorts focus on individuals with a pre-existing medical condition – but the MGRB is different. The database is designed to include only the DNA of healthy older people (as defined above) – and as a result, the genome sequences in the MGRB help to paint a picture of ‘what healthy looks like’ at the level of the DNA. 

Each of us harbours millions of ‘variants’ within our DNA. These are places in our genome sequence where we differ from others. Most variants have little impact, but some drive disease – and a major challenge for personalised medicine is to understand which is which.

Because the MGRB contains only the DNA of older people who are still healthy, it is expected to be relatively free of variants associated with disease. This makes it a powerful filter, or ‘control’, for accelerating genomic discovery in medical research. In addition, it will aid in the diagnosis of genetic disease and may shed light on mechanisms of healthy aging.

Professor David Thomas (Garvan Institute of Medical Research), who co-leads the MGRB project, says, “The Medical Genome Reference Bank can tell us much about what it means to grow old but remain well, and is a powerful tool to help us deconstruct the genetics of common diseases.”

A collaboration across NSW and Australia

The MGRB brings together a range of Australian organisations across government, aging research, genomics and infrastructure.

An initiative of the Kinghorn Centre for Clinical Genomics at the Garvan Institute of Medical Research, the MGRB includes genomic information from participants in two leading Australian research studies in older people: the ASPREE (ASPirin in Reducing Events in the Elderly) study (Monash University, Melbourne) and the 45 and Up study (Sax Institute, Sydney). Critical compute power has been contributed by the National Computational Infrastructure (NCI).

The development of the MGRB has been funded by NSW Health, through the Office of Health and Medical Research.  It is one of three major projects within the Sydney Genomics Collaborative.


Frequently asked questions

  1. What is a genome?

Our genome is the complete set of genetic information we inherit from our parents, encoded within 2 metres of DNA packed tightly into each of our cells as chromosomes. A human genome is approximately 6 billion bases, or letters of DNA code.

  1. What is genomics?

Genomics is the study of the structure and function of the genome of an organism.

  1. What is DNA sequencing?

DNA sequencing is a laboratory technique used to determine the sequence of units or bases in a DNA molecule. Sequencing methods have changed over time; the machines used by the Kinghorn Centre for Clinical Genomics use complex chemistry and high-resolution optics to determine the sequence.

The DNA sequence is a series of letters – As, Cs, Gs, and Ts – that represent the order of base pairs in a person’s DNA. The sequence of a single human genome has approximately 6,000 million letters to read and interpret. In a sequencing laboratory, machines break the DNA up into manageable segments and read the order of the DNA bases or letters. Computers are then used to compare the DNA sequence with other sequences to locate the differences or variants.

  1. What is a genomic cohort?

A cohort is a group of people who share one or more important characteristics. It could be that they have a particular genetic disorder, or in the case of the MGRB cohort that they are over 70 years old and healthy. Cohort studies usually focus on a group over time and help researchers learn about how a range of factors affect health and disease. In a genomic cohort, the genome of each individual is sequenced in order to compare it with others, both within in the cohort and beyond.

  1. How does the Medical Genome Reference Bank differ from other genomic cohorts?

As a publicly available genomic cohort that has been designed to be depleted of disease, the MGRB is the one of the first of its kind in the world. It is also the largest single Australian genomic cohort.

  1. What makes the Medical Genome Reference Bank useful?

A key challenge of interpretation in whole genome sequencing is understanding which of an individual’s 3.2 million variants cause disease, adverse drug reactions, or are benign. One of the most valuable filters to distinguish between disease-and non-disease-causing variants is the frequency of the variant in a healthy, aged population who carry fewer genetic drivers of disease.

The MGRB will act as this filter, or negative control group, as each participant is healthy. It will facilitate accurate diagnosis and discovery of genetic variants underpinning disease, will serve to significantly expedite the process of genomic discovery and could also provide insight into the genetics of healthy aging.

  1. What is the Medical Genome Reference Bank data portal?

The MGRB (accessed at is the point of access to genomic and other information it contains. Through the portal, curated data is openly accessible to the Australian and international research community.

  1. How is the privacy of individual participants in the Medical Genome Reference Bank maintained? How is access to the Reference Bank controlled?

The MGRB database is structured so that all data is non-identifiable in the hands of Garvan and through its data portal.

The governance framework employed at the MGRB is in line with the best in the world, aiming to enable access for researchers while maintaining privacy for individuals in the cohort.

There are three tiers of access to the MGRB data. The first is making queries through the online data portal, which provides summary data only. The second tier permits access to all of the genomic data from the portal, along with basic clinical information including gender, height, weight and age. The third tier enables access to deeper phenotypical information through the participating cohort. Researchers must apply for tier two and three access. Their applications will be considered by a data access review board.


About the Garvan Institute of Medical Research

The Garvan Institute of Medical Research is one of Australia's largest medical research institutions and is at the forefront of next-generation genomic sequencing in Australia. Garvan’s main research areas are: cancer, diabetes and metabolism, immunology and inflammation, osteoporosis and bone biology, and neuroscience. Garvan’s mission is to make significant contributions to medical science that will change the directions of science and medicine and have major impacts on human health. In 2012, Garvan established Australia’s first purpose‐built facility for undertaking clinical-grade genome sequencing and large-scale research projects. The Kinghorn Centre for Clinical Genomics (KCCG) researchers undertake collaborative projects and genome‐based studies to improve genome interpretation, with the aim of advancing the use of genomic information in patient care.

About the Sydney Genomics Collaborative

The Sydney Genomics Collaborative Program is a $24 million, 4-year program funded by NSW Health to boost genomic research across NSW, in order to shed light on the genetic basis of inherited diseases and on disorders with a genetic component, including cancer.

The Sydney Genomics Collaborative program was established in June 2014 to boost genomic research across NSW into inherited diseases and disorders with a genetic component, including cancer.

The Collaborative comprises three programs:

(1) Medical Genome Reference Bank – a resource that will eventually contain approximately 4,000 whole genome sequences from healthy, aged people to be used for control purposes in disease-specific genomic research;

(2) NSW Genomics Collaborative Grants – funding for researchers to undertake whole-genome sequencing to improve understanding of the genetic causes of disease.

(3) Genomic Cancer Medicine Program – a research program dedicated to applying genomics to the understanding, early detection, prevention and management of cancer, led by the Head of Garvan’s Cancer Division and Director of The Kinghorn Cancer Centre, Professor David Thomas.

About the ASPREE study

The ASPREE study (ASPirin in Reducing Events in the Elderly) was a randomised, double blind, placebo-controlled study for primary prevention of cardiovascular disease, dementia, depression and some cancers in 19,000 participants in Australia and the USA.

Led by Monash University in Australia and the Berman Centre for Outcomes and Clinical Research in the USA, the trial used a composite outcome of years of life free from physical and cognitive disability. More than 16,700 older people aged 70 plus, living in southeastern Australia enrolled in the ASPREE study (2,411 in the US).

ASPREE was funded by the National Institute on Aging (NIH in the USA), the NHMRC, Monash University and the Victorian Cancer Agency. The ASPREE Healthy Ageing Biobank was funded by the CSIRO, the National Cancer Institute (NIH in the USA) and Monash University.

About the 45 and Up Study

The 45 and Up Study (Sax Institute, Sydney) is the largest ongoing study of healthy aging in the southern hemisphere, involving more than a quarter of a million Australians. It is a major research tool being used by both researchers and policy makers to better understand how Australians are ageing and using health services, how to prevent and manage ill-health and disability and how this can guide decisions on our health system.