Search form

University of Antwerp, meerdere faculteiten

Contact details
Traineeship proposition

Abstract advanced bachelor of bioinformatics 2019-2020: Genetic variants of Dipeptidyl peptidase 9 in humans and their association with immunological and oncological pathology

BACKGROUND – The Laboratory of Medical Biochemistry at UAntwerp investigates the role of peptidases in health and disease, searching for new drug targets and disease markers. The structure-function relationships of proline specific peptidases are examined. Dipeptidyl peptidases 8 (DPP8) and 9 (DPP9) are intracellular N-terminal dipeptidyl exopeptidases, meaning that they hydrolyze peptide bonds near the N-terminal end, usually after a proline residue. These peptidases play important roles in immunological and oncological pathology amongst others. The three-dimensional structure has only recently been discovered. It is a real challenge to distinguish both Dipeptidyl peptidases in different experimental conditions. Fortunately, intensive research has already been conducted into their presence in immunity and cancer tissues. Additionally, many genetic variations have been found. These genetic variants (from benign to cancer associated) must be collected and presented clearly to gain more insights into the important sites on gene and protein level.

METHODS – During the research project, a major search for genetic variants of DPP9 with different clinical assertions, located on Genome Reference Consortium Human Build 37
(GRCh37), was carried out through online databases. The search was repeated for DPP8, to be compared with DPP9 afterwards. The National Center for Biotechnology Information
(NCBI) is well known and provides an extensive collection of genomic data. An NCBI search was performed via a Python script using the BeautifulSoup module and the E-utilities, in
which the gene of interest and the search can be specified. Genomic coordinates and associated data were collected from the Structural Variation Database (dbVar), the
database linking genomic variation with human health (ClinVar) and Single Nucleotide Polymorphism Database (dbSNP). Furthermore, cancer-driver mutations from different
databases were collected and compared. Huge projects like The Cancer Genome Atlas (TCGA) and The International Cancer Genome Consortium (ICGC) Data Portal provide a
bulk of cancer-driver mutations, created in close cooperation with leading cancer and genomic researchers and organizations. The data gathered from NCBI, TCGA, ICGC and
TumorPortal was mapped in R based on GRanges Objects using the Gviz Bioconductor package. Finally, for some interesting genetic variants, the sites of change in the proteins
were visualized in PyMOL to take a look at the position in 3D, relative to the important protein regions. A general flowchart can be found on the next page.
CONCLUSION - By comparing the genomic coordinates of the common genetic variants with those found in cancer databases, using the Gviz R package, important sites in proteincoding genes, and ultimately in the proteins, can be noticed. Exons 16 & 17 of DPP9 seem to be more sensitive to cancer-associated mutations, compared to the other exons. For DPP8, cancer-associated mutations are seen more in exon 17. Some of these cancer associated variants cause changes in the Rsegment of the protein.



Critical illness is characterized by the dysfunction of one or more organs, referred to as multiple organ dysfunction syndrome (MODS), which is caused by an inciting event such as major trauma, surgery or infection. Since an inflammatory response occurs dynamically along immunosuppression and cell death, there is growing interest in monitoring biomarkers for cell death and inflammation in biofluids to predict MODS. A potential biomarker in plasma for MODS is cell free DNA (cf-DNA). The release of cf-DNA is typically derived from dying cells and microorganisms. Not only nuclear cf-DNA but also mitochondrial cell free DNA (cf-mtDNA) might be a potential biomarker. Therefore, both nuclear and mitochondrial cf-DNA might be used as an indication of disease severity and as a predictor of mortality in critical illness. The research in this traineeship is an exploration for a new research project into the prediction of cell death, infection and inflammation. We explore the potential of nanopore sequencing to detect cell free DNA in plasma. Nanopore sequencing is a third-generation sequencing technique capable of sequencing whole linear DNA molecules. Nanopore sequencing was developed by Oxford Nanopore Technologies (ONT), one of the devices is the MinION, a portable sequencer. Because it generates a large amount of data containing information about the sequenced DNA fragments. Multiple tools are available to process this information (some examples are Guppy, Minimap2, Nanopack…), some tools for quality control and data exploration were compared based on multiple properties. The first results in this exploration are promising. Experiments were executed in 2 conditions, a control and the induction of endotoxemia through LPS inducing MODS. First of all we were able to detect cf-DNA isolated from plasma samples. The concentration of cf-DNA in the LPS treated mouse is approximately 40 times greater than the concentration in the control mouse, leading to more DNA fragments sequenced by MinION in the LPS condition. Plotting the number of reads against the read length visualizes a pattern of peaks that seems to be an indication of apoptosis, but this is not confirmed. When plotting the number of reads per chromosome, without chrUn and chr*_random contigs and normalized for the chromosome length, most chromosomes have an equal share with some exceptions. Furthermore we were also able to detect cf-mtDNA, although in a very low number of reads which was not expected at first. Since mt-DNA is circular, a possible explanation is that there is more cf-mtDNA present but that this was not fragmented and therefore not sequenced. The mt-DNA sequences from the control are longer than in LPS, indicating less fragmentation of the (mt-)DNA. The sequences of mtDNA in LPS are not evenly distributed over the mitochondrial genome, with most sequences mapping within a range of 2.5kb. Apart from reads aligning to the mouse genome, we also found foreign cf-DNA. This foreign cf-DNA was found to consist of 65% bacteria, 33% eukaryota and 2% viruses and archaea. Approximately 10% of the bacterial sequences could be identified as possibly originating from intestines. After aligning to the mouse genome and aligning against a microorganism database, there is still between 17 and 25% of the reads that was not identified. This leads to the overall conclusion that the exploration was successful, detecting nuclear, mitochondrial and foreign cell free DNA. Some suggestions for the future… Proving the presence of cf-mtDNA through qPCR and sequencing samples that were treated with restriction enzymes cleaving mt-DNA. If the same area of 2.5kb is found reoccurring in further experiments, possible other applications could be tested to predict MODS using this information. An expert in micro-biology could also be consulted to get more insight into the found microorganisms. Further research into the unidentified reads is also needed to identify the origin of the sequences.


Universiteitsplein 1
2000 Antwerpen


Traineeship supervisor
Tom Vanden Berghe
+32 32659250
Traineeship supervisor
Ingrid De Meester
Via Map