Abstract advanced bachelor of bioinformatics 1 2019-2020: Identification of candidate drug targets for the treatment of pediatric tumors
Neuroblastoma is an aggressive embryonal tumor of the sympathetic nervous system in children, which can regress spontaneously or grow and metastasize with resistance to multiple therapies. Neuroblastoma is characterized by DNA Copy Number Alterations (CNAs). Patients with neuroblastoma are categorized in different risk groups depending on the characteristics of the tumor. There are several risk groups(High and Other risk) and only a part of these risk groups only does show MYCN amplifications.
There are already some oncogenic driver genes identified (such as MYCN). Yet many driver genes remain to be found on the DNA CNA’s. In order to detect these driver genes, RNA expression and clinical data using multi-omics network inference are used. These potential novel drivers and their regulators can open a new approach for targeted therapy.
The starting point for this investigation is the clinical and RNA-expression data of 497 patients. In my traineeship my task was to analyze this dataset in pre-and postprocessing for network inference. This dataset had several clinical variables (including Risk group and MYCN amplification). Using the R(studio) software (edgeR package) a differential expression analysis (DE-analysis) was performed. This included several pre-processing steps including (log)-transformation, filtering, normalization and clustering of the dataset. The filtering was performed by only selecting the genes that were expressed in at least three samples. The normalization was performed using the Trimmed Mean of M-values method. For the clustering step both MultiDimensional Scaling and Principle Component Analysis were performed. After the DE-analysis between the high risk MYCN amplified and high risk non-MYCN amplified patients, no differentially expressed genes could be found.
The next step was to perform a feature extraction of genes. for network inference. By extracting highly variable genes, we retain as much information for statistical learning, while removing possible noise. To be able to perform this a certain variance cut-off value was to be determined. Using a histogram plot of the variances, we chose a cut-off value of 0,5. I also prepared “regulator lists” for the network inference, using several databases such as humanTFDB ( as a part of the animalTFDB), the CR2Cancer, Epifactors database and the Ensembl database. This involved data downloading, data wrangling for correct gene identifiers and only retaining the non-redundant genes with expression info in the data.
For post-processing, I focused on functional annotation using Gene Set Enrichment Analysis (GSEA) of the genes, after the DE-analysis and after the network inference on a set of co-expression/coregulatory modules. GSEA often starts with a preranked list of genes based on logfold changes. The Cluster profiler package in R and the MSigDB were used to perform the GSEA. This is the latest step in the investigation and these results are in the process of being interpreted.
Abstract advanced bachelor of bioinformatics 2 2019-2020: Comparative genome analysis of Gardnerella species with a focus on taxonomy
Bacterial vaginosis is a disturbance of the healthy vaginal microbiome where the lactobacilli are replaced with anaerobes such as Gardnerella vaginalis. This increases the acidity in the vagina. Under normal circumstances, the lactic acid-producing lactobacilli ensure a fairly high acidity in the natural environment of the vagina. Bacterial vaginosis is associated with premature birth and increased incidence of sexually transmitted infections including HIV.
Gardnerella vaginalis, the key pathogen in BV, has recently been shown to actually comprise 13 different species, based on the comparison of 81 genomes. Of these 13 species, some were suggested to be more virulant than others. My task in at the traineeship was to update this taxonomy. For this all the genomes of Gardnerella vaginalis from NCBI were downloaded. From this information different kinds of data were extracted with the use of python to in the first place control if the sequences were indeed of Gardnerella vaginalis. This was done by using the 16S ribosomal RNA region of the genome and if this was not present in the data other genes were used for this quality control. Control gene was used in a blast, if the top hit was Gardnerella vaginalis the genome passed the quality control else the genome could not be classified as Gardnerella vaginalis. The second part of the traineeship was the update of the figure where the ANI (average nucleotide identity) and the DDH (DNA-DNA-hybridization) are placed next to each other for proper subdivision of the different subspecies of Gardnerella vaginalis.
Both the average nucleotide identity and DNA-DNA-hybridization used the genomes provided from NCBI. The results were gathered and placed in excel and filtered.
The process of gathering the 16S ribosomal RNA, getting the genome files and using the ANI tool were atomized using python. Here the only input needed is the csv file from NCBI that can be downloaded when going to the genome page of a bacteria. This script is thus also usable for researching other bacteria than Gardnerella vaginalis. The DDH and the quality control and the filtering of the results to get the wanted structure are done manually due to a lack of time but will be atomized by the next trainee.
When looking at this figure different groups are visible which separate the different genomes in subspecies. This can be important for the next step when looking at the virulent genes, the virulent genes can differ in these groups and thus the activity of the specific made bacteriophage can differ between these groups.
The last phase of the internship was to use the virulent genes found in publications and provided from the internship to determine which genes are present on the genomes of these bacteria and which genes to use in the phage therapy to target these bacteria without interacting with the other bacteria present in the microbiome. In this way specific bacteria species can be eradicated from a microbiome to restore balance.
09 232 36 92