Howest, opleiding Biomedische laboratoriumtechnologie en Bio-informatica
Abstract traineeship (advanced bachelor of bioinformatics) 2017-2018 1: The development of an application for chimerism research after a hematopoietic stem cell transplantation
Leukemia patients need a bone marrow transplant so that they can develop healthy blood cells again. From that moment on, the blood cells of the patient/host should contain the DNA of the donor, meaning healthy blood cells are being produced by the host.
Chimerism is the occurrence of two sets of DNA in an organism. The more chimerism there is in a patient the more donor DNA is active. After a transplant the patient should have a high chimerism grade.
The molecular laboratory of AZ Sint-Jan measures chimerism in a leukemia patient in order to check if a transplant worked for the patient. Multiple measurements are performed to provide a follow-up for the patient.
The tool takes data generated with Genemapper 5 and determines the number of peaks for donor, patient and sample per marker. The type of the marker is assigned based on the peaks. Type I markers have no shared alleles between donor and host, type II markers use a shared and a non-shared allele to calculate chimerism and type III markers are not informative. When stutter interference is present a stutter ratio is calculated which is used as a criterion to exclude markers from the total percentage chimerism calculation.
The result of the calculations is a total percentage for all type I markers, all type II markers and both type I & type II markers. A table is shown where values can be checked for an alternative total calculation. The tool can produce a report and a raw data text file of the results.
An additional part of the project was also the validation of the tool. Dr. F. Nollet provided data which was processed by both the lab software and the webtool. A comparison of the results was then performed to check the tool. The validation had to be run three times. First validation run there was an error with stutter ratios not being detected. This error was solved with adjusting the standard stutter range from 0.2 to 0.5. The second validation run then revealed that stutter ratios were calculated on peaks that were irrelevant. Extra lines of code were added to exclude these peaks. The third and last validation run revealed no other errors and the tool slightly outperforms the lab software when calculating stutter of the amelogenin marker. This concludes the tool is validated with Genemapper data and is ready to use.
Abstract traineeship (advanced bachelor of bioinformatics) 2017-2018 2: Bio-informatic study of the gene NLRP7 and the possible role in Inflammatory Bowel Disease
The study of the gene NLRP7, and the mutations that this gene can have which can possibly play a role in the development of inflammatory bowel disease. There has also been examined which amino acids have to be changed to induce these mutations in the mouse genome.
The research has been started with making a list of all genes in the NLR gene family and the different isoforms of NLRP7. This list consisted of the subfamily of the protein, the symbol, full name, aliases, chromosome, start and end location on the chromosome, the protein identifier. This is done for both the human and the mouse genome. A bit more than 50 members were found in both of the genomes. Then the flanking genes were studied, this has been done to find possible duplications that have happened during the evolution or to find closely related genes. Remarkably the mouse genome doesn’t contain the NLRP7. Based on the NLR family list a multiple sequence alignment was performed on all protein sequences of the different family members, this has been done with the tool MUSCLE. This was done to study conserved domains between the family members. This has also the advantage that the regions that can contain the possible deleterious mutations in NLRP7 can be compared between the different family members. Next a phylogenetic tree was made from the data of MUSCLE, with the application simple phylogeny. There was found that the proteins NLRP2 and NLRP7 are closely related, so NLRP2 can be used as an alternative for NLRP7. Then there has been performed a pairwise alignment (on the EBI) of some proteins in the NLR* family. Next the protein structure has been predicted. First the protein database of structures (PDB) has been searched for the protein NLRP7 homologues, this is done by blasting the protein against the RCSB protein database. There has been found some homologs of the protein, these where loaded into pymol. These models can give an image of the protein how its looks like. To get more reliable results the NLRP7 protein was predicted/modelled with some tools. The first tool used was swissmodel this tool models the protein against one other similar protein in the protein database, here has been noticed that there are not so much proteins in the protein database that had a high sequence similarity, the highest was 5IRL, a NOD2 protein in rabbits, which had 33% sequence identity. To overcome this problem one other tool has been used, this tool is modeller. This tool has given the advantage that multiple proteins or parts of proteins can be used for modelling an other protein such as NLRP7. The algorithm calculates several models (changeable) from which the best models are chosen, the best models are chosen by looking at the different Ramachandran plots at RAMPAGE Ramachandran. These models can be loaded into pymol where they can be aligned with other proteins and/or analyzed. Next there has been performed differential expression analysis, this has been performed to investigate if the gene NLRP7 is differentially expressed in the IBD population versus the non-IBD population (control). For the analysis first there has been searched some GEO datasets on the NCBI website and they were further analyzed in R with a script. The problem was that there were not much good datasets that had clear results, an example of this are datasets with much variation in the control group. There has been found a dataset that has given some results with more significance but it’s also low, that can be verified by searching the GEO profiles, these profiles give some indication. For the mouse there hasn’t been found any good datasets or profiles that compared IBD vs. non-IBD mice. There has also been performed a research on the evolutionary history of the genes NLRP2 and NLRP7. This has been done by making a multiple sequence alignment with MUSCLE of the NLRP2 and NLRP7 proteins between the different species. Next there has been searched for some organisms if they contain the genes NLRP7 and/or NLRP2 this has been done with the UCSC genome browser. This also gives information from where the genes could have been originated. From these results there has been made an evolutionary tree with the different classes. Last there has been performed a posttranslational modification research on prediction servers, mostly on the server “elm.eu”. Where has been predicted if the places where the possible deleterious mutations can occur, could have any possible transformation after the translation step and what happens if the mutations are induced in the protein. The conclusion that can be made is that there can be tried to modify the mouse gene Nlrp2 and also there is known which amino acids has to be changed. Inducing the mutation can be used to get a better understanding of the role of the mutation for IBD, and possibly help in the development of medicine.
Abstract traineeship (advanced bachelor of bioinformatics) 2016-2017: Development of a webtool for chimerism calculations after stem cell transplantation
Stem cell transplantations are a widely used treatment for diseases like leukemia. But in order to make sure the new stem cells are used to grow healthy white blood cells, the patient needs to be tested on regular intervals. To determine whether the treatment was successful or not, the chimerism percentage must be calculated. This percentage corresponds with the amount of blood cells derived from the new stem cells (of the donor) versus the number of blood cells derived from the original stem cells of the patient.
Data is collected via a variety of different biochemical techniques based on the variable number of tandem repeats (VNTR) in genes. These can vary between people and are therefore useful to identify the origin of the blood cells. The chimerism percentage can be calculated from this data via a formula. There are different formula depending on VNTR type.
However, these calculations are still done manually in some laboratories. This is time consuming and there’s always a chance for human errors. The goal of this project was to develop a webtool, called QuickChim, that can execute the chimerism calculations automatically to speed-up the process of the follow-up.
QuickChim is a webtool, written with php, html and css. The information about the kits used to analyze the samples are stored in a mysql database.
To calculate the percent donor chimerism, the user needs to upload three files. One file for a sample taken from the donor at the start of the treatment (the donor reference file), one for the sample taken from the patient at the start of the treatment (the patient reference file) and one file for the sample taken for follow-up analysis. These files contain information about every peak viewed by the biochemical technique used to analyze the sample. Important information for this tool is the color, size and area of each peak.
When the files are uploaded and the parameters are set, the user can submit the data and QuickChim will calculate the percentage chimerism for every VNTR. When the calculations were successful, two buttons will be generated. One prints a pdf report of the results, which can be downloaded and used to report the results to the doctor in charge of the treatment. The second button lets the user download a text file containing the parameters used for analysis, together with the tables containing the used information of all three uploaded files.
8200 St.-Michiels Brugge