Indel Analysis Essay

  • 1.

    Collins, F.S. & Varmus, H.A new initiative on precision medicine. N. Engl. J. Med.372, 793–795 (2015).

  • 2.

    Highnam, G.et al.An analytical framework for optimizing variant discovery from personal genomes. Nat. Commun.6, 6275 (2015).

  • 3.

    Watson, J.D., Baker, T.A., Gann, A., Levine, M. & Losick, R.Molecular Biology of the Gene 7th edn. (Cold Spring Harbor Laboratory Press, (2013).

  • 4.

    Nik-Zainal, S.et al.Mutational processes molding the genomes of 21 breast cancers. Cell149, 979–993 (2012).

  • 5.

    Zaidi, S.et al.De novo mutations in histone-modifying genes in congenital heart disease. Nature498, 220–223 (2013).

  • 6.

    Iossifov, I.et al.De novo gene disruptions in children on the autistic spectrum. Neuron74, 285–299 (2012).

  • 7.

    Iossifov, I.et al.The contribution of de novo coding mutations to autism spectrum disorder. Nature515, 216–221 (2014).

  • 8.

    Gupta, R.S.Protein phylogenies and signature sequences: a reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes. Microbiol. Mol. Biol. Rev.62, 1435–1491 (1998).

  • 9.

    Tian, D.et al.Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes. Nature455, 105–108 (2008).

  • 10.

    MacArthur, D.G.et al.A systematic survey of loss-of-function variants in human protein-coding genes. Science335, 823–828 (2012).

  • 11.

    Fukuoka, S.et al.Loss of function of a proline-containing protein confers durable disease resistance in rice. Science325, 998–1001 (2009).

  • 12.

    Denver, D.R.et al.High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature430, 679–682 (2004).

  • 13.

    Montgomery, S.B.et al.The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res.23, 749–761 (2013).

  • 14.

    Mullaney, J.M.et al.Small insertions and deletions (INDELs) in human genomes. Hum. Mol. Genet.19, R131–R136 (2010).

  • 15.

    Jiang, Y., Turinsky, A.L. & Brudno, M.The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection. Nucleic Acids Res.43, 7217–7228 (2015).

  • 16.

    Narzisi, G.et al.Accurate de novo and transmitted indel detection in exome-capture data using microassembly. Nat. Methods11, 1033–1036 (2014).

  • 17.

    Narzisi, G. & Schatz, M.C.The challenge of small-scale repeats for indel discovery. Front. Bioeng. Biotechnol.3, 8 (2015).

  • 18.

    Fang, H.et al.Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Med.6, 89 (2014).

  • 19.

    DePristo, M.A.et al.A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet.43, 491–498 (2011).

  • 20.

    Albers, C.A.et al.Dindel: accurate indel calls from short-read data. Genome Res.21, 961–973 (2011).

  • 21.

    Li, H.et al.The Sequence Alignment/Map format and SAMtools. Bioinformatics25, 2078–2079 (2009).

  • 22.

    Ye, K.et al.Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics25, 2865–2871 (2009).

  • 23.

    Karakoc, E.et al.Detection of structural variants and indels within exome data. Nat. Methods9, 176–178 (2012).

  • 24.

    Iqbal, Z.et al.De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet.44, 226–232 (2012).

  • 25.

    Van der Auwera, G.A.et al.From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics11, 11 10 1–11 10 33 (2013).

  • 26.

    Li, S.et al.SOAPindel: efficient identification of indels from short paired reads. Genome Res.23, 195–200 (2013).

  • 27.

    Pabinger, S.et al.A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform.15, 256–278 (2014).

  • 28.

    Rimmer, A.et al.Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet.46, 912–918 (2014).

  • 29.

    Mose, L.E.et al.ABRA: improved coding indel detection via assembly-based realignment. Bioinformatics30, 2813–2815 (2014).

  • 30.

    Chen, K.et al.TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. Genome Res. 24310–24317 (2014).

  • 31.

    Weisenfeld, N.I.et al.Comprehensive variation discovery in single human genomes. Nat. Genet.46, 1350–1355 (2014).

  • 32.

    Leggett, R.M.et al.Identifying and classifying trait linked polymorphisms in non-reference species by walking coloured de Bruijn graphs. PLoS One8, e60058 (2013).

  • 33.

    Chen, X.et al.Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics32, 1220–1222 (2016).

  • 34.

    Yang, R.et al.ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly. Genome Med.7, 127 (2015).

  • 1. Introduction

    Lung cancer is the most common cause of cancer death around the world. About 80%–85% of lung cancer cases are non-small-cell lung cancer (NSCLC) patients, the remaining 15%–20% are small-cell lung cancer (SCLC) [1]. NSCLC is divided into three categories called: adenocarcinoma, squamous-cell adenocarcinoma and large cell adenocarcinoma. Among them, adenocarcinoma cases account for around 40% of NSCLC patients. The prognosis for NSCLC is low with a five-year survival rate of less than 20%, and is even worse for SCLC with a five-year survival rate of less than 5% [1].

    For a long time, the first-line treatments have been surgery, chemotherapy or radiotherapy. However, the discovery of several oncogenic driver mutations in patients with NSCLC, adenocarcinoma cases in particular, has allowed the development of personalized treatments based on these specific molecular alterations. Therefore, EGFR (epidermal growth factor receptor) mutations account for up to 15% of adenocarcinoma and primarily occurred in the tyrosine kinase (TK) domain of the gene. More than 80% of these mutations consist of in-frame deletions in exon 19 and the L858R point mutation in exon 21. Such mutations induced a constitutive activation of EGFR, making it a potential therapeutic target. Thus, EGFR-mutated patients can benefit from a specific first-line treatment specifically the TK inhibitors (TKI) that competitively inhibits fixation of adenosine triphosphate (ATP) in the catalytic binding site of TK domain. Other driver biomarkers in lung cancer (point mutations, rearrangements or amplifications in specific genes including KRAS, NRAS, HER2, BRAF, ALK, RET, and ROS1) have also been proposed and some of them might provide additional information for clinical decision-making.

    Unfortunately, side effects of personalized treatments have emerged. Among them, the appearance of the T790M mutation located in exon 20 of EGFR systematically results in cancer relapse, generally within 1–2 years. The T790M mutation is present in about half of the lung cancer patients with acquired resistance, and is reported to increase the affinity of the receptor to ATP, relative to its affinity to TKIs [2]. Identification of such mutations is required to propose second-line treatment. Recently, third-generation EGFR inhibitors, such as osimertinib, mereletinib or rociletinib, have been proposed as relevant therapeutics that could specifically disrupt the growth of EGFR T790M-positive tumors and thus increase patient survival [3,4,5].

    2. Tumor Tissue Biopsy Limitations

    Molecular characterization of tumors became mandatory, not only for patients to receive the right treatment, but also to follow the evolution of the molecular characteristics and, accordingly, to adapt treatments [6]. Tissue biopsies remain the gold standard to assess molecular alterations. However, this strategy presents several limitations that can impair patient treatment. Indeed, access to tumor tissues is not always optimal. Many patients with NSCLC are diagnosed at an advanced stage of the disease that makes the surgery or the biopsy difficult and even sometimes dangerous. Thus, complications from intrathoracic biopsies have been reported in 17.1% cases in a series of 211 biopsies [7]. In addition, the quality/quantity of the available tumoral material and EGFR genotyping failed in approximately 5% of the cases [8]. Finally, the intratumoral heterogeneity of EGFR mutation status has been described in several studies (ranges from 13.9% to 27%; [9]) demonstrating that tumor biopsy do not systematically reveal the complete genomic landscape of the whole patient tumoral cell population. Altogether, these issues related to tissue biopsy analysis failure resulted in an unknown EGFR status and excluded some patients that could have been eligible to TKI treatment.

    Given these limitations, exploring alternative practical, economical and less invasive techniques to monitor the EGFR TKI therapy in NSCLC is absolutely needed. Noninvasive approaches, based on samples of plasma or serum, have shown great potential in monitoring the EGFR TKI therapy in recent years. Among the different materials derived from liquid biopsies, ctDNA has been successfully applied to detect EGFR mutations in NSCLC patients and can give similar molecular information as those given by invasive tumor biopsies [10] (Figure 1). In addition, the dynamic changes in ctDNA EGFR mutation status may predict clinical outcome of EGFR TKI therapy [11]. In patient drug resistance instances, one alternative to improve early detection rate and overcome the limitation of repeated tissue sampling is to perform genomic analysis using other liquid biopsy markers such as circulating tumor cells (CTCs), circulating RNA, circulating miRNA, platelet markers, etc. Since the use of these different markers for lung cancer management has previously been reported, it will not be discussed here [11,12,13,14,15,16,17].

    Hereby, we summarized different technical approaches available that have been proposed for the detection of molecular events from ctDNA and considered their possible applications in hospitals and routine laboratories for the management and monitoring of patients with lung cancer.

    3. The Biology of cfDNA and Circulating Tumoral DNA (ctDNA)

    New opportunities arose with the discovery of circulating cell-free DNA (cfDNA) in unaffected individuals [18]. Application includes different fields specifically the non-invasive prenatal diagnosis with the use of cell-free fetal DNA (cffDNA; [19]) and cancer with the use of circulating tumor DNA (ctDNA; [20]).

    Origin and mechanisms of cfDNA release in bloodstream are still not completely documented. It is however widely accepted that several conditions such as inflammation, heavy smoking, or pregnancy can induce cfDNA release from cells into the systemic circulation [21,22,23]. As for patients suffering from heart injury, cfDNA increase over the first 48 h in emergency intensive care unit predicts fatal outcome [24]. The source of ctDNA is also likely multiple and mainly included cell lysis induced by apoptosis and/or necrosis of primary tumors and metastases [25,26] (Figure 1).

    cfDNA and ctDNA are highly fragmented with a median size of 170 base pairs or less, which corresponds to the DNA wrapped around a nucleosome plus a linker fragment [27,28]. Several studies have tried to clarify the alleged mechanism of ctDNA (necrosis versus apoptosis) depending on the size of the ctDNA, however, results remain controverted [26,29]. Indeed, Wang et al. [30] and Gao et al. [31] reported that ctDNA is longer than normal cfDNA [30,31]. Paradoxically, Diel et al. [27] and Moulière et al. [29] observed a lower size of ctDNA. Most importantly, ctDNA is probably composed of short and long fragments with genetic aberrations specifically carried by the shorter ones. This hypothesis has been recently validated in hepathocellular carcinoma patients [32] and in lung cancer patients [22].

    4. Technical Approaches for ctDNA Detection and Analysis

    Preanalytical conditions may certainly play a crucial role in ctDNA detection. Due to different aspects of ctDNA (high fragmentation, contamination by non-tumoral cfDNA, low amounts and clearance), detection of molecular events from ctDNA materials remains a challenge and requires adapted and ultrasensitive analytical assays. Therefore, specific formaldehyde-free cfDNA collection tubes have recently been commercialized. Such processes not only stabilize but also prevent the release of genomic DNA from nucleated blood cells and reduce the need of immediate plasma preparation. In addition, these tubes allow transport and storage at room temperature and are highly adapted to hospital shipment procedures.

    Comparative analysis of ctDNA in plasma and serum have shown that plasma represents the best tool to monitor NSCLC patients in clinical practice [33]. However, ctDNA dilution in patient’s cfDNA highly limits liquid biopsy’s detection of genetic alteration. Only a few thousands of copy number of cfDNA per milliliter of plasma could be extracted, among which only a small fraction is clinically relevant. Therefore, since genetic alterations that need to be detected from ctDNA are diluted by both the non-tumoral cfDNA and by the non-mutated ctDNA, highly sensitive and specific detection methods are required to provide a relevant ctDNA-based diagnosis. This concern has led to the improvement and the development of several methods of detection such as real-time polymerase chain reaction (PCR), digital PCR (dPCR), Next-Generation Sequencing (NGS), Beads, Emulsion, Amplification, and Magnetics (BEAMing) (Table 1). These methods can be classified into two groups: (i) the targeted approaches that allow detection of specific alterations; and (ii) the untargeted approaches that allow identification of events without a priori, in particular whole-exome sequencing or whole-genome sequencing.

    4.1. Real-Time PCR-Based Methods

    Allele-specific amplification combined with real-time PCR are commonly used in clinical setting to detect mutations from formalin-fixed paraffin-embedded (FFPE) tumor tissues. Even commercial kits based on the same principle have been developed and are widely used to detect single nucleotide variation (SNV) or small insertion/deletion (indels) (therascreen kit from Qiagen, Hilden, Germany and cobas® from Roche Diagnostics, Meylan, France). However, as they were not fully adapted to the detection of rare genetic events, specific and more sensitive PCR-based methods have been engineered. Notably, custom-designed coamplification at lower denaturation temperature (COLD-PCR) [53,54] or Peptide Nuclei Acid-Locked Nucleic Acid (PNA-LNA) PCR clamp method [35,55,56] have been successfully applied to lung cancer samples. Briefly, COLD-PCR allows the enrichment of low-abundance mutations from a mixture of wild-type, regardless of whether they are known or unknown mutations. Therefore, lower denaturation temperature used during the PCR helps the amplification of heteroduplex mutant/wild-type sequence [34,57]. This PCR method has been further coupled with HRM, pyrosequencing, or Sequencing analysis of the harbored mutations identification [34].

    PNA-LNA PCR clamp protocol takes advantage of the increased stability of PNA and LNA probes to highly bind DNA sequences compared to DNA duplex. In this approach, PNA probes firmly bind to DNA to specifically inhibit the amplification of the wild-type allele and thus, increase the specific detection of the mutant allele in real-time PCR cycling. An improved PNA-LNA PCR clamp method has been used to detect EGFR mutations in plasma samples [56].

    Efforts were also focused on the improvement of allele-specific amplification technique. Indeed, probe-blocking methods have been engineered to block amplification of wild-type templates and thus, to increase detection sensitivity of mutant alleles. Therefore, minor groove binder (MGB) blocker oligonucleotide [37] and modified non-extendable primer blocker (NEPB) [36] have been developed and demonstrated the detection of mutation present at 0.1% in a background of wild-type DNA. Scorpion probes, for which higher sensitivity compared to Taqman probes has been demonstrated, also enable the detection of rare mutations [58,59,60,61,62].

    Finally, as there is a tremendous and increased market for the detection of mutation from plasma specimens, new versions of commercial kits have been refined. In particular, the cobas®EGFR Mutation Test v2 has been the first liquid biopsy test to be approved by the Food and Drug Administration (FDA) for the detection of EGFR mutations.

    4.2. Digital PCR (dPCR)

    dPCR relies on a real-time PCR, except that DNA templates are partitioned to obtain individual DNA molecule per entities (well, droplet or chamber) that are subsequently amplified by PCR and independently analyzed. Based on the Poisson distribution, it is assumed that small volume reaction compartments must contain 0 or 1 DNA molecules. After end-point PCR quantification of positive compartments, absolute concentration of the target is determined. Several digital PCR platforms are available and based on different process: microfluidic-chamber-based, micro-well chip-based and droplet-based [63]. The most common platforms in clinical laboratories are digital droplet PCR (ddPCR) in which samples are dispersed into thousands of droplets. Droplets containing mutated or non-mutated DNA strand can be discriminated by flow cytometry using fluorescent TaqMan-based probes [63], which allows sensitive detection of mutated ctDNA in a vast background of cfDNA.

    Besides high sensitivity estimated at 0.01% to 0.1% [38], dPCR also has a relatively easy workflow, which can be implemented in a clinical setting [64]. Moreover, it has also been applied to detection of copy number variations (CNVs) in the blood sample of lung cancer patients [65]. One disadvantage is that dPCR only screens for known mutations, even if recent works demonstrated the feasibility of multiplex dPCR to detect EGFR and KRAS mutation in blood samples of cancer patients [40,66].

    4.3. Beads, Emulsion, Amplification and Magnetics (BEAMing)

    BEAMing is also a targeted approach based on the same principle as the emulsion PCR. Briefly, a first conventional PCR step is performed using primers specific of the targeted sequence that contain known tag sequences. Emulsion PCR of the amplicons is done in presence of tag-coupled magnetic beads that is easily purified. After single-base primer extension or hybridization with fluorescent mutant-specific probes, flow cytometric analysis allows the detection and quantification of mutant versus wild-type alleles [42]. In lung cancer samples, this technique already demonstrated its potency in the detection of EGFR activating mutations and the T790M resistance mutation from plasma DNA samples [41,67,68]. Like dPCR methods, BEAMing only allows the screening of known mutations, furthermore it also has a complex workflow and a high cost per sample, making implementation in routine clinical settings less feasible.

    4.4. Next-Generation Sequencing (NGS)-Based Approaches

    NGS is based on the analysis of millions of short sequences from DNA molecules and their comparison to a reference sequence. Multiple applications have been developed and currently used in oncology, such as targeted sequencing and whole-exome or whole-genome sequencing. Currently, NGS demonstrates a high sensitivity and specificity; nevertheless, random error rate of sequencing platforms is between 0.1% and 1% depending on the platform used [69], making impossible the detection of rare mutations. According to this observation, protocols have been specifically improved and expanded to detect rare mutations in plasma samples. Despite its great advantage to detect multiple somatic alterations simultaneously, NGS remains an expensive and time-consuming technique. Furthermore, extensive data analysis requires highly experienced bioinformaticians to identify with high confidence relevant mutations. Nevertheless, global approaches provide more accurate mutational spectrum of the tumor than targeted analyses and may also allow detection of copy number alterations and large rearrangements [46

    0 thoughts on “Indel Analysis Essay

    Leave a Reply

    Your email address will not be published. Required fields are marked *