Sample Data

Kinghorn Centre for Clinicial Genomics Sample Sequencing Data

These data are from the Coriell Cell Repository NA12878 reference cell line, which has been extensively analysed by the Genome in a Bottle Consortium. The sequencing libraries were generated with Illumina’s TruSeq Nano V2.5 kit using the Hamilton Microlab STAR robotics platform, achieving >400bp inserts. Each library was sequenced on a single lane of an Illumina HiSeq X patterned flow cell, achieving over 130Gb of yield, with > 83% of bases with quality > Q30 in just 2.8 days. The four data sets are of similar quality, and are provided to allow you to assess the reproducibility of the technology. Each data set substantially surpasses the minimum coverage and quality guaranteed by Illumina and is indicative of the potential for the Illumina HiSeq X Ten sequencing system.

Each of the four datasets consists of raw paired-end data (fastq.gz files) and results obtained with the GATK DNAseq best practices pipeline run on each library independently with the recommended parameters for whole genome sequencing.

Download the data to your computer or server, using the links below.

Human sample data


Replicate 3Replicate 4
FASTQ R1 NA12878_V2.5_Robot_3_R1.fastq.gz NA12878_V2.5_Robot_4_R1.fastq.gz
FASTQ R2 NA12878_V2.5_Robot_3_R2.fastq.gz NA12878_V2.5_Robot_4_R2.fastq.gz
BAM NA12878_V2.5_Robot_3.dedup.realigned.recalibrated.bam NA12878_V2.5_Robot_4.dedup.realigned.recalibrated.bam
BAI NA12878_V2.5_Robot_3.dedup.realigned.recalibrated.bam.bai NA12878_V2.5_Robot_4.dedup.realigned.recalibrated.bam.bai
VCF NA12878_V2.5_Robot_3.hc.vqsr.vep.vcf.gz NA12878_V2.5_Robot_4.hc.vqsr.vep.vcf.gz
TBI NA12878_V2.5_Robot_3.hc.vqsr.vep.vcf.gz.tbi NA12878_V2.5_Robot_4.hc.vqsr.vep.vcf.gz.tbi
MD5sum NA12878_V2.5_Robot.md5sum


Key metrics

Lane 5 6 1 2
Read length 151bp PE 151bp PE 151bp PE 151bp PE
Raw Read Pairs (PF) 439,013,514 510,726,469 464,350,208 479,861,658
Raw Yield (Gb) 131.704 153.218 139.305 143.958
% bases >=Q30 (R1/R2) 92.39/81.23 93.18/73.37 89.89/77.44 93.00/78.75
% bases >=Q30 (mean) 86.81 83.28 83.67 85.88
Alignment % 98.448 98.972 95.53 97.911
Duplication % 10.6 11.549 10.592 12.032
Coverage (mean) 34.8873 39.51173 35.69027 37.34294
Coverage (stdev) 10.52019 11.37284 10.24081 10.57043


Mouse sample data

Coming soon.

© Garvan Institute 2017