University of Antwerp, Biomina, Adrem Data Lab
Abstract 2019-2020: Building a tool for processing data and visualizing the result from a compound-protein interaction predictor
Drugs are often developed to target proteins that participate in many cellular processes. Characterization of the interactions between these proteins and compounds is a very important part of drug development. During the internship, the focus was finding a solution for this problem through machine learning and presenting this solution in a way relevant to the field.
Starting from source code that provided the API that builds the machine learning model used for predicting compound-protein interactions (CPI), the goal of the internship was to create a pipeline for this API. Following the general steps in any machine learning project of data pre-processing, model creation and model evaluation, the result provides a command line tool as well as a user interface in the form of a web application.
With CPI prediction, which we can call a binary classification problem, integration from the source code allows for the deep learning model to make use of the Weisfeiler-Lehman graph kernel to extract the features from the compounds and graph convolutional neural networks (CNN) to encode the protein sequences.
The pipeline created not only provides a command line tool to train future models for classification purposes, but also to test and evaluate it with user provided data. By implementing the Django framework, a UI was built around this part of the pipeline to supply the possibility to end users to test their data that inside this user-friendly webtool.
Abstract traineeship advanced bachelor of bioinformatics 2017-2018: Development of an immunologic affinity prediction webtool
The current T cell epitope prediction tools typically focus on the prediction of peptide binding and presentation by molecules located on the surface of antigen-presenting cells (major histocompatibility complex molecules). These tools are capable of accurately performing these predictions. But what is not included in these tools is the prediction of peptide-MHC complex by T cell receptors (TCR). The ADReM Data lab is currently developing a classification model to predict recognition of a peptide by a TCR, based on random forest classifiers.
Also in development is a webtool called TCRex which is able to process input data based on these classifiers. With TCRex, it is possible to analyse TCR (CDR3) sequence data against a number of selected epitopes and predict the probability of an epitope binding with that given sequence. Also, the user can train and test a new classifier with the user’s own train and test data.
The TCRex webtool is developed in Django Web Development framework. Django is an open source, high-level Python Web framework based on the Model-view-controller-model. It is build by experienced developers and takes care of much of the hassle of web development, so it allows you to rapidly develop a web application “without the need of reinventing the wheel” [djangoproject.com, 2005].