AZ Delta ziekenhuis
Abstract advanced bachelor of bioinformatics 2019-2020: Copy Number Variant (CNV) and structural variation detection in somatic oncogenic panels
Within the field of next generation sequencing (NGS) technologies, targeted sequencing panels aim to analyze genes or gene regions with known associations with diseases like cancer . This allows for rapid detection of a variety of somatic mutations. Two methods are commonly used for targeted sequencing: capture hybridization-based and amplicon based sequencing. With capture based sequencing, more targets per panel can be sequenced . Amplicon sequencing comes with a lower cost per sample and has higher on-target rates.
Analysis of somatic mutations can be split in small events like single nucleotide polymorphisms (SNPs) and larger events like copy number variations (CNVs). A CNV is a mutation of the DNA sequence causing an alteration of the copy number of a DNA segment. These mutations can involve deletions, duplications, insertions and inversions. Some of these CNVs are associated with the development of various types of cancer . Somatic CNVs are, in contrast to germline CNVs, mutations that occur in a single body cell and thus cannot be inherited. This project aims to test some bioinformatics tools for somatic CNV detection on gene panels: Contra , Oncocnv  and Cnvkit .
CNV detection can be grouped in a SNP based and a read count based method. For whole genome sequencing (WGS) and whole exome sequencing (WES) both methods are suitable (e.g. Wisecondor  and GATK ). For the more cost-effective targeted data, only the read count method is convenient. Another distinction is made in the choice of the normal sample. In an ideal scenario, a matched tumor-normal pair is used. This is, however, costly. A good alternative is the creation of a panel of normals (PoN) [4, 5, 6, 7, 8] of normal tissue samples. This PoN captures the technical bias of the platform.
The general method for CNV detection [4, 5, 6, 7, 8] consists of following steps. First, the coverage of the targets of the tumor sample is calculated. These read counts are then divided into bins. Next, a PoN is created. After this, a normalization to the pooled reference is executed and other corrections are made to remove biases like GC content. The final step consists of the recognition of deleted and/or amplified regions. This calling step and the normalization step are the two steps that causes different results between the tools. It is also an option to plot the results. Especially in Cnvkit some nice plots like heatmaps, scatterplots and chromosome ideograms can be created.
In a first stage, at home, data was simulated. An Illumina target panel was provided. Only the targets of chromosome 7 of this panel, which covers the most bases, were used to keep at home simulation doable with regard to the available resources. TargetedSim  was used to simulate the reads of normal samples and samples containing CNVs. These reads were then mapped with BWA  and sorted with Samtools . As input for the different CNV detection tools, a BED-file with the targets and BAM-files with the reads of the samples were required. In a second stage, real patient data was used. The data was first preprocessed with Trimmomatic , mapped with BWA  and sorted with Elprep . Podman containers  were used to execute the different tools.
All of the tools were able to detect the CNVs in the simulated data. Contra was the only tool with a high false positive rate. On the patient data, similar results were obtained (Fig 1B). In most samples, Contra was the only tool that reports a CNV, which confirms the high false positive rate indicated by the simulation. Approximately 50% of the CNVs reported by Oncocnv were confirmed by at least one of the other tools. While Cnvkit detected, in numbers, the most CNVs that were also found in one of the other tools. This combination of tools increases the chances of a true positive CNV. For some samples the presence of a CNV was confirmed by FISH (only if clinically relevant). Most of these confirmed CNVs (true positives) were reported by all the tested CNV detection tools. One of these CNVs was only detected by Contra, but only the non-adjusted p-value was significant. Two FISH confirmed CNVs were not found by any of the three tools. A possible explanation could be that the probes for the FISH detection are targeting a different part of the gene then captured by the sequencing probes.
As conclusion, it can be stated that caution should be taken with respect to false positive results. Cnvkit and Oncocnv seem the most reliable tools for CNV detection. A combination of the different tools reduced the number of false positives drastically, while preserving the high true positive rate. The intersection detected by all three tools were confirmed by the FISH method. Further validation on a bigger patient dataset is ongoing.
 Idt. (n.d.). Targeted sequencing: Hybridization capture or amplicon sequencing? Retrieved 04/06/2020, from: https://eu.idtdna.com/pages/education/decoded/article/targeted-sequencing-hybridization-capture-or-amplicon-sequencing
 Illumina. (n.d.). Introduction to Targeted Gene Sequencing. Retrieved 04/06/2020, from: https://www.illumina.com/techniques/sequencing/dna-sequencing/targeted-resequencing/targeted-panels.html
 Zhang N., Wang M., Zhang P., Huang T. (2016). Classification of cancers based on copy number variation landscapes. Biochim Biophys Acta. 1860:2750–2755
 Li J., Lupat R., Amarasinghe K.C., Thompson E.R., Doyle M.A., Ryland G.L., et al. (2012). CONTRA: copy number analysis for targeted resequencing. Bioinformatics. 28(10):1307–13
 Boeva,V. et al. (2014) Multi-factor data normalization enables the detection of copy number aberrations in amplicon sequencing data. Bioinformatics, 30(24):3443-3450.
 Talevich E., Shain A.H., Botton T., Bastian B.C.. (2016) CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput Biol. 12(4):e1004873.
 Straver R., Sistermans E.A., Holstege H., Visser A., Oudejans C.B.M., Reinders M.J.T. (2014). WISECONDOR: detection of fetal aberrations from shallow sequencing maternal plasma based on a within‐sample comparison scheme. Nucleic Acids Res. 42(5):e31.
 McKenna, A.H. et al. (2010). The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20: 1297–1303.
 Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14), 1754–1760.
 Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., & 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079.
 Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120.
 Herzeel C, Costanza P, Decap D, Fostier J, Reumers J (2015) elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling. PLoS ONE 10(7): e0132868.
 Podman (n.d). Podman. Retrieved 11/06/2020, from: https://podman.io/
Abstract 2018-2019: Implementation of a bioinformatic pipeline for whole genome NIPT data using WisecondorX and extended mapping tools with gene annotation
Background and aim:
Non-invasive prenatal testing (NIPT) is used for the detection of trisomy 13 (Patau syndrome), trisomy 18 (Edwards syndrome), trisomy 21 (Down Syndrome), the gender of the fetus and for the screening of abnormalities on the sex chromosomes. Shallow whole genome sequencing (sWGS)-based NIPT additionally has the potential to evaluate all chromosome pairs to identify losses or gains of entire chromosomes, or segmental defects such as microdeletions. Recently a new bioinformatic tool called WisecondorX was reported (Raman et al., 2018), a further development of the WISECONDOR script introduced by Straver et al. (2013). WisecondorX makes it possible to detect chromosomal copy number aberrations by within-sample normalization of sWGS data, with a comparable resolution to classic karyotyping. This way an overview of the complete genome is obtained from both the mother and the child. A second tool named SeqFF was introduced by Kim et al (2015). This tool determines the fetal DNA fraction from the plasma of pregnant women by using sequence read counts. This is based on the different sequencing behavior of maternal and fetal cell-free DNA (cfDNA). The aim of the project was to integrate WisecondorX and SeqFF into an automated pipeline to analyze sWGS-NIPT data generated by the Illumina VeriSeq NIPT Solution. VeriSeq is a high-throughput NIPT technology with automated liquid handling and a proprietary, nondisclosed but CE-IVD certified data analysis algorithm for prenatal screening of trisomy 13, 18, 21 and sex chromosomal anomalies. The WisecondorX-SeqFF pipeline will be used for independent analysis of VeriSeq data, and resolve cases with various error codes due to suspected rare autosomal trisomies other than trisomy 13, 18 and 21.
WisecondorX (a revised version of WISECONDOR: WIthin-SamplE COpy Number aberration DetectOR) is a freely available python- and R-based software package for the detection of copy number aberrations on the whole genome. A reference set is build based on plasma samples of 300 pregnancies with normal outcome (absence of sex chromosomal anomalies and trisomy 13, 18 and 21), 150 male and 150 female foetuses. The WisecondorX script contains four modules: a module to convert fastq files to bam files, a module to build a new reference set, a module for the actual evaluation of the whole genome and a module to predict the gender. The underlying statistics include a BWT-type alignment, segmentation of the reference genome into bins, and calculation of z-score statistics for read frequency per bin versus an optimized set of reference bins selected in a prior iterative training step. Assessment of trisomy is then done by aggregating all zscores for groups of bins along the chromosome using Stouffer’s z-score stats. An automated pipeline was set up to perform the whole analysis of the NIPT data and as output a sample rapport is generated. In order to complete the entire pipeline three additional tools (bcl2fastq, bowtie2 with the human genome 38 and biobambam) were used. SeqFF is a freely available R-based script (with the trained statistical model included) for the determination of the amount of fetal cfDNA. The value is determined by the average of the elastic net (Enet) value and the weight rank selection criterion (WRSC) value based on their strong performance on high dimensional, small sample size and complementary data assumption. To build the pipeline of SeqFF two additional steps (bowtie2 with the human genome 37 and a grep function) must be performed before running the provided R-script.
First a reference set was built based on 300 healthy control samples, to select the ideal set of normalizers for each bin on the genome. Then the pipeline of WisecondorX was tested in one proof-of-concept run, and after optimizing output (bin size, reporting format) a larger validation set of 13 consecutive VeriSeq runs (N=588 samples) was analyzed to perform a method comparison to the VeriSeq NIPT algorithm, as well as for some samples to a targeted SNP-based NIPT technique (Multiplicom Clarigo). Overall, WisecondorX achieved good concordance with VeriSeq, and was useful to resolve a number of cases that generated error codes by VeriSeq due to presence of rare autosomal trisomies, technical artefacts or maternal cancer. WisecondorX, however, also yielded one false negative trisomy 21 screening in a sample with low fetal fraction < 4%, that was correctly identified by VeriSeq. WisecondorX statistics also reported subchromosomal gains and losses, that mostly were derived from sequencing and/or alignment artefacts around challenging genomic regions with repetitive DNA sequences. Further research is required to filter these noisy events from potentially true positive segmental gains or losses.
Automated sWGS-NIPT analysis using WisecondorX and SeqFF represents a useful addition to the proprietary VeriSeq data analysis but cannot replace it. In particular it can be used as independent assessment of suspected rare autosomal trisomies and to resolve technical sequencing artifacts. Caution is warranted in the clinical use of WisecondorX data at low fetal fractions as reflected by SeqFF < 4%.
Abstract Bachelor Project 2017-2018 (Pathology): Validation of the Autostainer Link 48
The aim of this research is to validate a new Autostainer Link 48 from the firm Agilent as replacement for the old platform. The aim of this research is to secure the correct sample staining before analysing analyse patient samples. The Autostainer Link 48 stains paraffin slides using immunohistochemistry. For this study, four parameters being tested: repeatability, reproducibility, correctness, homogeneity.
To test the repeatability, antibodies such as CD7, CK7, CD45, CK PAN, Ki67 being stained all together in one run, on three several days. There is one slide of every antibody in the run. That’s different with the reproducibility, there are three slides of every antibody being stained in one run. The antibodies being stained for reproducibility are the same as the ones stained for repeatability. Reproducibility is performed on one day, not on several days.
The correctness is examined by staining with antibodies such as PMS2, MLH1, MSH2, MSH6, CD117, S100, PDL-1, P53, BCL6 one time in a run. Last but not least there is the homogeneity as an examined parameter. Therefor, an antibody such as CK PAN or vimentin is placed in the staining platform on all 48 sites, to check if there is a homogeneous staining.
As a result of the study the four parameters were approved by the pathologists. At the end of the study, because all parameters were approved, the Autostainer was released for analysing patient samples. The validation process was carried out before the old Autostainer was removed.
Abstract Bachelor Project 2017-2018 (Ardolab, Clinical lab): Implementation of Point of Care Testing for Activated Clotting Time
AZ Delta decided last year to replace their current Activated clotting time (ACT) analyzers “ACTPLus by Medtronic” with the “i-STAT Alinity Analyzer by Abbott”. The main reason for this was that the eight analyzers of the year 2006 and one analyzer of the year 2004 were due for renewal.
ACT is determined to know how much heparin should be added during an operation. The ACT of a healthy person is 90 to 130 seconds without the addition of anticoagulant. During heart operations, the target ACT value is 450 seconds and more. This is a very large range, however there is no 'golden standard' ACT method, therefore, there is no “true” ACT value.
The aim of the experiments is mainly to test the i-STAT Alinity of Abbott by means of performance testing to decide whether the device can be used in the work field.
All nine instruments were tested using two control levels, the results were within the reference values of the manufacturer. The measurement variance was less than 5%. The reproducibility was tested on three instruments using control levels and a plasma pool, % CV was less than 5%. The normal reference values were also determined and these comply with the range recommended by i-STAT. The correlation test confirms a good correlation between 2 devices.
From the results of the performance tests it can be concluded that the i-STAT Alinity Analyzer meets the criteria that has been defined by AZ Delta.
Background and aim: Standard molecular diagnostic testing for metastatic colorectal cancers (CRC) includes analysis of somatic mutations in KRAS, NRAS and BRAF and assessment of ERBB2 amplification. More recently routine testing was extended with detection of microsatellite instability (MSI). Microsatellites are repetitive DNA tracts that are prone to polymerase slippage events during DNA replication. In healthy cells, such DNA replication errors are corrected by the DNA mismatch repair system (MMR). Loss-of-function of MMR pathway proteins (MSH2, MLH1, PMS1, PMS2, MSH6, or MSH3) results in variations in the repeat lengths, or microsatellite instability (MSI). MSI is the hallmark of Consensus Molecular Subtype 1 (CMS1) CRC subtype, encountered in 15-20% of all CRC: due to their DNA instability, these tumours are hypermutated and highly immunogenic, and tend to respond favourably to immune checkpoint inhibitor therapy. Currently, MSI is measured by a separate test, either by PCR analysis of specific loci (MSI-PCR), or by immunohistochemistry staining for loss of MMR protein expression (MSI-IHC). To improve the efficiency of the diagnostic workflow, we aimed to integrate MSI testing in our standard NGS workflow.
Methods: mSINGS (MSI by NGS) is a python-based open source script for MSI analysis using NGS data. The script analyses 14 microsatellite loci, embedded in a hybridization capture-based gene panel (NimbleGen SeqCap EZ choice, Kappa Hyperplus workflow, Roche). The script compares the output of a VarScan readcount file of experimental samples to a baseline trained by microsatellite-stable (MSS) control samples. MSI status is determined by the fraction of unstable loci. We implemented mSINGS on a HPC (High-Performance Computing) cluster in PSB (Plant and System biology) Ghent, optimized visual display, automated the workflow and validated MSI-NGS to MSI-PCR and MSI-IHC as reference techniques
Results: First, mSINGS was tested on 3 MSS and 3 MSI colon samples, with concordant microsatellite status by MSI-PCR and MSI-IHC. After the recommended baseline validation for custom assays, 3 target genes were excluded to improve discriminatory statistical power. Results were visualized in R markdown and compared to IHC staining and PCR-based MSI measurement. Next, we tested mSINGS in a larger cohort of samples (n=30), using MSI-IHC as reference. Overall, MSI-NGS, using 11 discriminant biomarker regions, showed > 95% concordance with the reference assay thus validating its clinical use.
Conclusion: Implementation of mSINGS to analyse MSI status from available NGS data increases the efficiency of molecular classification in CRC tumours and provides a robust and accurate clinical tool to select patients potentially responsive to immunotherapy.
In the laboratory of Pathology of the AZ Delta Campus Westlaan in Roeselare, there is research on tissues and body fluids. The laboratory can be divided into four compartments. First of all, you have the room where the tissue is cut into smaller pieces and then placed in cassettes. The next compartment of the laboratory is the room “cutting and coloring”, where the cassettes are embedded after treatment in the device. The next step is to cut the embedded tissues in sections. All sections are stained with hematoxylin-eosin staining or other additional colorings. Thirdly, there is the department of cytology where the body fluids are processed. And finally, the department of immunohistochemistry, that’s where the immunohistochemical stainings are performed.
The purpose of this bachelor test is to analyze the administrative part of this workflow with a risk analysis. What could all be wrong with the administration while processing a sample? What is the risk? How are these errors discovered? What could be the result of such a mistake?
For this purpose, a risk analysis was prepared using the FMEA, Failure Mode and Effect Analysis, method. FMEA is a risk analysis that requires a number of steps to minimize a mistake. In this risk analysis, the possible errors, causes, discovery and consequences were examined. Once these process steps were overrun, an RPN, Risk Priority Number, value could be drawn up. This RPN value looked at the probability of occurrence, the discovery and the consequence of this error. If this value was higher than six, an action was needed.
In the administrative workflow of the laboratory of pathology, many errors can occur. Most errors were observed when filling in the application form, registration and reporting. The biggest consequences are sample or patient change, which will make the RPN high.
The most common risks can be avoided through the establishment of an electric application form. By applying these measures, the margin of error will be minimized, making the RPN value less than six. These measures ensure that errors be corrected within the administration in the laboratory. This makes the risk of a wrong diagnosis smaller.
The aims of this study were to evaluate the performance of a new automated system for immunohematological analyzes (Erytra® - DG Gel; Grifols) and to compare the data with two widely used systems, namely Ortho BioVue (AutoVue® - OCD) and DiaMed-ID (ID-Gelstation® – Bio-Rad). Blood group assays and antibody screenings are performed as pretransfusion tests and during pregnancy. This research was conducted in the context of a uniformization of the automatisation for immunohematological testing and preventive replacement of some older devices.
The evaluation and comparison of the three systems are performed over a period of five weeks. Most of the samples were collected from the routine. More special samples, for example with positive direct agglutination tests and positive antibody screening and identification, were gathered in the previous months or obtained from other laboratories. An analytical validation was performed, including a method comparison, sensitivity and reproducibility testing. Also operational functionalities were evaluated, such as turnaround time, volume testing, carry-over and error generation.
In general, it can be decided that small differences between the three methods were established on the basis of the method comparison. In the ABO assay, Grifols reacts more strongly to the reverse grouping and is more sensitive double populations. Weak Rhesus-D reactions were also picked up by Grifols.
In the screening of the indirect antiglobulin test (IAT) minor differences of sensitivity for certain samples were seen between the different methods. Grifols performed equal to Bio-Rad. In IAT identifications, Grifols is equal to OCD and Bio-Rad. In some cases, the enzyme phase was more susceptible to anti-Rh antibodies, while the Coombs screening proved less sensitive to anti-Lea. It should be kept in mind that these Grifols analyzes were done automatically and the other two methods manually. When samples are stored at refrigerator temperature, immunohematological tests can be tested reproductive up to seven days. When repeating weak reactions, Erytra®/Grifols proved less sensitive than other systems. The sensitivity assay with titrated anti-D also showed a weaker sensitivity to OCD. However, expressed in reaction strength, this was no more than one gradation. This may be due to the use of cards with glass beads instead of cards with gel.
At operational level, Grifols scored satisfactorily with the turnaround time determination, however, the BCSH directive for an ABO determination could not be achieved with any system. The volume tests show that Erytra® can produce a result with smaller volumes of whole blood with the exception of the IAT (required volume intermediate to ID-Gelstation® and Autovue®). The differences between the three methods were only minimal. None of the methods showed a sign of carry-over.
From this comparison, it can be concluded that the Grifols reagents are equivalent to Bio-Rad and OCD, depending on the test-defined differences, as noted above. The Erytra® device was appreciated as a robust system for the implementation of immunohematological analyzes. The comparison of the corresponding software was not included in this thesis. These results will be included in order to make a final choice, along with other aspects such as ease of use, finances, company service,... .
The current study investigates the performance of GeneXpert for the detection of FII and FV Leiden mutation. Different aspects of the test including reproducibility, accuracy, sample type and sample storage conditions were compared to the currently used in-house developed PCR assay. In addition to these performance criteria, the total cost per test on both platforms was calculated.
An excellent performance of the GeneXpert assay was observed. Both reproducibility and accuracy were scored 100% compared to the in-house PCR. Moreover, an extended storage of the samples at 2-8 °C for 15 days as compared to the recommendations of the manufacturer had no impact on test accuracy. Despite this good performance and the ease of use of the GeneXpert assay it was decided not to implement this assay in routine practice. This decision was mainly based on the high cost of the geneXpert assay compared to the in-house PCR.
Dr. Anne Vandewiele
Dr. Inge Vanhaute
Conny Van Keirsbulck