Cell line authentication at Garvan Molecular Genetics
The use of misidentified cell lines in cancer and other biomedical research continues to occur. The DSMZ cell bank in Germany found that 18% of human cancer cell lines supplied by originators were misidentified. An extensive list of cross-contaminated and misidentified cell lines has been collated from scientific literature. When published in 2010, the list contained a total of 360 affected cell lines, most with no known authentic stocks.
Consequently, a significant amount of money may be spent on research using misidentified cell lines worldwide. Scientists may believe or claim that they are working with cells derived from one individual or animal species, only to eventually learn that the cells were derived from a different individual or species altogether.
In the past, scientists may have used the argument that inexpensive and reliable technologies for unambiguously identifying cell lines did not exist as a reason for not authenticating their cell lines. Now, however, standardised, simple and rapid methods are available to authenticate human cell lines.
How it works
STR (Short Tandem Repeat) loci consist of short, repetitive sequence elements 3–7 base pairs in length. Short Tandem Repeats (STR) are repetitive sequences:
- Pentanucleotide: AAAGA AAAGA AAAGA AAAGA
- Tetranucleotide: AAAG AAAG AAAG AAAG
- Trinucleotide: CTT CTT CTT CTT CTT
- Dinucleotide: AG AG AG AG AG AG
The number of repeats in STR loci can be highly variable among individuals, which make these genetic markers effective for human identification or cell line authentication purposes. STRs have become popular DNA repeat markers because they are easily amplified by the polymerase chain reaction (PCR) using PCR primers that bind to the flanking regions surrounding the STR repeat. Multiple STR loci can be examined simultaneously to create a DNA profile.
For the human identity analysis, tetra and pentanucleotide markers are preferred for their good balance of “ease of interpretation” and “variation found in nature". Alleles of STR loci are differentiated by the number of copies of the repeat sequence contained within the amplified region and are distinguished from one another using fluorescence detection following capillary electrophoretic separation. There are four different fluorochromes used so that the analysis can be performed in one single capillary run.
For the Cell Line Identification Service at the Garvan Institute the PowerPlexR 18D System is used, which has the following 18 markers (seventeen STR loci and Amelogenin): D3S1358, TH01, D21S11, D18S51, Penta E, D5S818, D13S317, D7S820, D16S539, CSF1PO, Penta D, Amelogenin, vWA, D8S1179, TPOX, FGA, D19S433 and D2S1338. After analysis, cell line IDs are considered to be a match if their profiles are more than 80% identical. This service will provide an authentication certificate for you that will identify reference sample IDs via Garvan’s cell line identification database.
Each run on the 3130XL instrument contains an allelic ladder sample which is used to fine-tune the results of actual samples regarding the specific run conditions, the genemapper analysis software calculates the allele call from the predefined regional bins and the precise peak alignement of the allelic ladder for each sample peak.
Comparing Results to Cellbank and ATCC
Retrospective studies on 500 human cell lines reveal that a minimum of eight core STR markers, D5S818, D13S317, D7S820, D16S539, vWA, TH01, TPOX, CSF1PO are recommended to show relatedness between cell lines, to uniquely identify human cells and for profile comparisons. Occasionally a cell line locus will completely lack any alleles, in which case a “null” is documented as part of the eight loci.
The probability of two cell lines with identical STR profiles using the 8 core STR markers is 1 to 200,000,000. Several providers will therefore use fewer markers than the 18 used for certificates from Garvan Molecular Genetics. ATCC or Cellbank Australia certificates may show 11 or 16 marker results. The new Garvan database will contain information from older certificates (hence fewer markers) and newer results with more markers. Obviously over time the number of results with 18 markers will grow.
Authentication using STR-profiling means establishing an acceptable degree of identity matching between the test cell line and the original donor tissue or repository cell stock. Authentication using STR-profiling systems currently available for human identification may not imply freedom from inter-species (non-human) cell contamination and will not lead to direct detection of non-human DNA contamination. Detection of interspecies cell cross-contamination will require other methods such as isoenzyme analysis or PCR with amplification primers that are complementary to non-human species.
The limitations of STR profiling, described above, emphasise the point that this technology addresses only a single component of cell authentication, namely a means of establishing identity. STR profiling fills a gap in the current authentication paradigm for human cells, but STR profiling should not be construed as a stand-alone authentication method as it does not replace or serve the function of technologies designed to assess cell purity (absence of interspecies cross-contamination / absence of adventitious contaminants such as viruses, microbes and mollicutes), phenotype, morphology, or cell ploidy.
The matching criterion is based on an algorithm comparing the number of shared alleles between two cell line samples, expressed as a percentage. A previously authenticated sample is selected as a 'reference' profile, while the sample undergoing authentication is the 'questioned' profile. The matching algorithm to determine percent % match between two cell lines equal = the number of shared alleles in both STR profiles divided ÷ by the total number of alleles in the questioned profile (homozygous alleles are counted as one allele).
Cell lines with >80 % match are considered to be related; derived from a common ancestry. Cell lines with between a 55 % to 80 % match require further profiling for authentication of relatedness. In saying cell line samples have a common ancestry, these samples may come from the same cell line; from two separate cell lines that were derived from the same donor; one cell line may be derived from the other; or one cell line may be misidentified or cross-contaminated.
Small variations may occur in any of these situations when cell lines are maintained in culture, and so some degree of profile variation must be allowed for correct authentication. The choice of >80 % as a suitable threshold is based on published work from the ASN-0002 Standard workgroup, looking at the 8 core loci in related cell line samples from five cell banks. According to this analysis, 98% of related cell line samples can be successfully authenticated using the matching algorithm, with >80 % match as the threshold.
In a small number of cell lines, STR profiles show a greater degree of instability, resulting in a percent match of <80 %. Published work from the ASN-0002 Standard workgroup indicates that unrelated cell lines generally show percent match figures of 55 % or less. Based on this match threshold, cell lines that are below 56 % match are considered unrelated. Cell lines with a percent match of 56-80 % are probably unrelated, but this range includes a small number of related cell lines, and further data may be needed to confirm or refute this conclusion.
This matching algorithm can be used to define cell lines as unique, authenticated, misidentified, or cross-contaminated. Additional information may be needed for these conclusions, including the presumed identity of the cell line and what is known about its history. In cases where a hybrid cell line is newly formed and its STR is generated for the first time, that profile should be compared first to the parental cell line’s profile. An already established hybrid with its profile entered in the database maybe treated like any other cell line.
In addition to applying the algorithm, it is essential to look at the quality of the STR profile to determine if the sample is appropriate for interpretation (e.g. a degraded sample will be uninterpretable), or if a mixture is likely to be present.
Genetic drift, allelic dropouts and off-ladder alleles
STR profiles from some cell lines may vary slightly as cells are cultured, and this is a potential limitation of the technique. Cell lines displaying microsatellite instability appear particularly susceptible to genetic drift with continued culture.
Allelic dropout or stochastic effect is the failure to detect an allele within a DNA sample, or failure to amplify an allele during the PCR. Allelic dropout may be the result of a mutation or an allele outside the normal call range of a particular locus that goes undetected. It can also be caused by too much DNA in the PCR or degraded DNA. Usually allelic dropouts are the result of a peak imbalance in tumor cell lines due to aneuploidy. In diploid samples, PCRs involving template levels below approximately 100 pg of DNA or about 17 diploid copies of genomic DNA have been shown to exhibit allelic drop-out. False homozygous results are obtained if one of the alleles fails to amplify. Stochastic effects may be overcome by increasing DNA template or increasing the PCR cycle number.
However, allelic dropout is frequently observed in tumor cell lines and is referred to as loss of heterozygousity (LOH). The LOH may be due to mutation at one of the alleles that prevents PCR amplification or loss of the chromosome or chromosomal region containing the dropped out allele. A review of the literature shows that the spontaneous mutation rate of cultured cells is between 10-6 – 10-7. This rate is dependent on cell line, media constituents, plating density and passaging procedures. High peak imbalance at multiple loci can be characteristic of a cell line.
Allelic ladders represent the most common alleles at each locus within the population. They were established through the evaluation of data from several hundred individuals. Alleles within the STR loci vary greatly between and among individuals. However, allelic ladders in commercial kits do not represent all possible alleles for all commercially available STR markers. Alleles that lie outside the allelic ‘bins’ are often referred to as ’off-ladder’ alleles. In some cases, off-ladder alleles are considered microvariants. It is not unusual to observe microvariants in continuous cell lines, especially tumor cell lines. Several of the loci used in DNA typing of human cell lines have rare alleles which contain partial repeats, known as microvariants.
The tetra nucleotide STRs mostly used for human cell line identification have a four (4) bp DNA sequence called the repeat motif. A microvariant occurs when one of the repeat units contains only one, two or three bases of the repeat motif. The microvariant allele is called by a whole number, indicating the number of full repeats and a decimal representing how many bases occur in the partial repeat. Microvariants can complicate the analytical process and lead to inconclusive results. The cause of the complication is the small size difference; microvariant alleles can differ from a normal allele by a single base pair so the resolution of the instrument used for electrophoresis must maintain a one bp resolution.
Cross contamination involves the contamination of one human cell line with another during the expansion of a culture. In its early stages, cellular cross-contamination results in a mixture; in its later stages, rapid growth of the contaminating cell line usually leads to an STR profile more consistent with misidentification. Mixtures are unusual because the micro-environment of the culture usually favors the contaminating cell line. However, in some cases the contaminated cell line is observed as distinct STR profiles along with the expected cell. The STR profile of a mixed cross-contaminated culture usually involves more than two alleles at three or more loci as seen in figure 4.
|Figure 4. (click image to enlarge) Cross contaminated cell line in its early stages (mixture present) is clearly seen in this example with three or more alleles at five loci.|
Some cell lines exhibit a three allele pattern at multiple loci, which can be characteristic of a cell line. The allele calls need to be confirmed with the profile in the database and additional investigation needs to be performed to rule out cell cross contamination.
Among the more than 200 species in the Mollicutes class eight species that are collectively called mycoplasmas, including Mycoplasma hyorhinis, Mycoplasma orale, Mycoplasma hominis, Mycoplasma argininis, Mycoplasma salivarium, Mycoplasma pirum, Mycoplasma fermentans, and Acholeplasma laidlawii, account for over 95% of isolates from cultured cells (Barile, 1979; McGarrity and Kotani, 1985; Btilske, 1988; McGarrity et al., 1992). In addition we are able to detect Mycoplasma pneumoniae, Mycoplasma gallisepticum, Mycoplasma genitalium, Mycoplasma penetrans, Mycoplasma synoviae, Mycoplasma bovis, Mycoplasma hyopneumoniae, Ureaplasma urealyticum and Spiroplasma citri.
We perform 4x PCRs with primers that amplify a portion of the 23S rRNA gene and a conserved region within the 16S rRNA gene. These primers do not detect eukaryotic DNA or bacterial genera with close phylogenetic relation to mycoplasmas such as Clostridium, Lactobacillus and Streptococcus. Please submit 1ml of supernatant for this test.
For pricing information please see our Molecular Genetics Shop website.
Please use the Cell Line Identification Service Sample Submission Form (see top of page) for submission of samples – email your completed form to email address.
You can submit DNA or cells for the Cell Line Identification Service, for the Mycoplasma test it is recommended to submit supernatant. Please see the Cell Line Identification Service Sample Submission Guidelines for guidance on labelling your samples and submission requirements.
The turnaround time is a maximum of 14 days but usually, samples are processed within a week