Search form

VIB - UG - departement Planten Systeembiologie (PSB), UGent

Contact details
Traineeship proposition
Abstract
Testimony
Admin
Stage-onderwerp banaba bio-informatica 2015-2016: Transcriptoom analyse van mais variëteiten
 
Stage-onderwerp (2010-2011) 1: Automatisering van NGS data voor planten genomen
Een goed inzicht in de processen van eukaryotische gen-transcriptie is essentieel voor het bestuderen van de impact ervan op een fenotype. Desondanks is er momenteel nog steeds een gebrek aan omvangrijke studies hieromtrent, zelfs voor de moderne model systemen. Het automatisch binnenhalen en verwerken van RNA-seq data is een cruciale stap in het transformeren van de miljoenen reads tot een transcriptoom omnibus voor verschillende species. Niet enkel is dit van groot belang voor het ‘ontdekken’ van nieuwe genen maar tevens zal de analyse ons een beter inzicht verschaffen in de verschillende transcripts en hun isovormen die aanwezig zijn in de verschillende tissues tijdens bepaalde ontwikkelingsstadia en/of onder stress condities. Complementair hieraan zal het automatisch verkrijgen van Chip-Seq (dit zijn eiwit-DNA interacties) en de daaropvolgende analyse zoals het aligneren van de reads en het lokaliseren van pieken in de alignment frequentie een belangrijke data bron zijn voor het ontrafelen van transcriptiefactor regulatorische netwerken.
De objectieven in dit project zijn dan ook i) automatisch nagaan welke data beschikbaar is in het sequence read archive (SRA) ii) opstellen van een pipeline om voor bepaalde species de gevraagde data binnen te halen (bvb gebruikmakend van SRA) en de initiële verwerking ervan te doen iii) geavanceerde RNA-seq analyses: transcript reconstructie en analyse van differentieel ge-expresseerde genen iv) ‘Peak calling’ in de ChipSeq data sets.
 
Stage-onderwerp (2010-2011) 2: Data integratie aan de hand van CORNET (Bioinformatics and Evolutionary Genomics, Dr. Stefanie De Bodt)
CORNET is een webtool die netwerken bouwt door de integratie van verschillende datatypes, zoals microarray genexpressie, proteine-proteine interacties, lokalisatie in de cel, genfunctie en het biologisch proces waarin het gen betrokken is. Netwerken van nodes die genen of proteïnen voorstellen en edges die relaties of interacties tussen genen voorstellen worden gevisualiseerd in Cytoscape. In de huidige versie wordt informatie van de modelplant Arabidopsis of zandraket weergegeven. Heel wat andere data is beschikbaar, waaronder transcriptiefactor-target relaties, gen-fenotype relaties, maar ook analoge informatie in andere organismen, zoals de economisch en agronomisch interessante maïs, rijst en populier. Hiervoor moet de nieuwe data verzameld worden in de CORNET databank en moet de webinterface en visualisatie uitgebreid worden. Hierdoor zullen meer gedetailleerde netwerken kunnen opgesteld worden wat ons toelaat de moleculaire mechanismen die onder andere bladgroei controleren verder te ontrafelen. Bovendien kunnen we vergelijken of deze molecules en interacties tussen molecules voorkomen in meerdere organismen of net specifiek zijn voor bepaalde organismen. Meer info:
bioinformatics.psb.ugent.be/cornet en www.cytoscape.org

 
Ook stage-onderwerp (2010-2011) 3 mogelijk in Evolutionary Systems Biology (Dr. Steven Maere)
Abstract traineeship advanced bachelor of bioinformatics 2020-2021: IDENTIFYING AND VISUALIZING PROTEIN CLEAVAGE SITES IN ARABIDOPSIS THALIANA
During this traineeship an answer to the following question is addressed “To what extent are protein cleavage products in Arabidopsis thaliana (Arabidopsis) detectable in shotgun proteomics data and can they be accounted for in future searches?”. To analyze this, the trainee will use the Python and R programming languages to construct a computational pipeline that acquires, preprocesses, searches public proteomics data, next to post-hoc scripts that summarize and visualize the obtained results.
To identify protein cleavage products, two separate search indices need to be made, a tryptic and semi-tryptic one. The tryptic and semi-tryptic indices accord for a full (two enzymatic termini) and partial (one enzymatic terminus) digestion of the peptides. The semi-tryptic index is made to be able to locate non-tryptic and cellular cleavage events, with other words natural cleavage sites which may have a biological meaning. These indices are built with the Crux tide-index function (https://crux.ms/commands/tide-index.html). Tide is a tool for identifying peptides from tandem mass spectra. It assigns peptides to spectra by comparing the observed spectra to a catalog of theoretical spectra derived from a database of known proteins.
The first step of this pipeline is the gathering and preprocessing of raw Thermo (.raw) proteomics datafiles to peak list (.mgf) files. For this project we re-analyzed three large Arabidopsis proteomic studies (PXD012708, PXD014877 and PXD013868) that are publicly available on the PRIDE repository (https://www.ebi.ac.uk/pride/). To obtain all the FTP addresses of the .raw files a package named pridepy (https://github.com/PRIDE- Archive/pridepy) has been used, which searches the PRIDE repository with a given project identifier. Afterwards, each separate acquired FTP address is processed in a loop and stored according to their queried PRIDE identifier. All data is downloaded multi-threaded by Axel (https://github.com/axel-download-accelerator/axel) (giving an considerable speed boost compared to wget) and converted to .mgf format using ThermoRawFileParser (https://github.com/compomics/ThermoRawFileParser). These converted .mgf files are then searched with the Crux cascade-search function (https://crux.ms/commands/cascade-search.html). This search will use the previously made indices and search both of them in a automated, fast and statistically robust manner to find the cleavage products.
In total 769 MS samples have been analyzed and 2 post hoc scripts will be needed to interpret the obtained data. The first post hoc procedure is to summarize al the data so they are readable by non-experts. The summary is made with a Python script which filters the peptide identification results for identified cleavage sites. The number of peptide-to- spectrum (PSMs) to semi-tryptic peptides and their respective modification status (N- terminal acetylation, pyro-Glu formation or non-modified) are stored. In addition, a TargetP 2.0 prediction (http://www.cbs.dtu.dk/services/TargetP/)is also incorporated in the summary to compare to predicted cleavage sites of N-terminal sorting signals for organellar targeting for proteins, e.g. mitochondrial transit peptide (mTP) or chloroplast. All this information is then written into a tab-delimited file that can easily be copied in Excel.
The second post hoc procedure is to visualize this data, this is done using a R script. This script requires a single protein identifier and will output a graph that displays several tryptic and semi-tryptic peptide identification results (Figure 1). These are the amount of semi-tryptic PSMs and their respective modification status, a heatmap above the x-axis to show the tryptic PSMs, a heatmap under the x-axis to show the probability a PSM will be found (found with DeepMSPeptide, a deep learning algorithm to predict peptide detectability) and a vertical line to represent the predicted cleavage site of that protein identifier.
Lastly all of the data is processed into a .fasta file to produce ‘cleavage-aware’ FASTA databases. A total of 4 FASTA files will be made by adding N-terminal truncated versions of existing proteins, in other words the found cleavage sites will be searched as protein N- termini. The first 2 FASTA files will supplement the representative Arabidopsis proteome (Araport11, 48359 entries) with the cleavage sites which have more than 5 PSMs and the cleavages sites matching TargetP 2.0 predictions respectively. The last 2 FASTA files will do the same except the will use the full Arabidopsis proteome (including splice forms).
After creating these cleavage-aware FASTA databases, we tested their merit on a dataset studying the effect of 4 hours mannitol stress in Arabidopsis (PXD008900) using a standard, tryptic MaxQuant search. Searching the Araport11 representative proteins identified 15,843 tryptic peptides, while searching extended databases with all cleavage sites or only those corresponding to TargetP 2.0 matching cleavages proteins resulted in 402 (+ 2.54%) and 787 (+ 4.97%) additional peptide identifications. This clearly demonstrates that accounting for protein cleavage products makes significant contributions to tryptic peptide identification. Searching the full Araport11 proteome with splice forms results in 16,049 peptide identifications, which is 600 peptides less than when searching protein cleavage products. Hence, instead of accounting for pre-mRNA splicing in protein databases, it could be more meaningful to consider protein cleavage.
 
Abstract traineeship advanced bachelor of bioinformatics 2017-2018: Computational promoter and expression analysis to characterize stress regulation of Nictaba-related genes

The main characters in this story are the Nictaba lectin orthologues from Arabidopsis thaliana. Lectins are proteins that can selectively and reversibly bind to sugar structures. This class of proteins is widely represented in the plant kingdom but is also present in animals and fungi.

Some plant lectin families are constitutively expressed, while other families show an inducible expression mainly upon stress signals. A plant can suffer from abiotic and biotic stresses. Abiotic stress includes heat, drought, heavy metals, cold and salt stress  while biotic stresses consist of insect infestations and pathogen infections.

One of these inducible plant lectins is Nictaba or Nicotiana tabacum agglutinin. It was firstly discovered in the leaves of tobacco, and was identified as a jasmonate inducible protein. In the genome of Arabidopsis thaliana, 31 orthologues of the Nictaba lectin gene have been identified. The Nictaba domain is often linked to another protein domain or a C- or N-terminal region. The most common domain is the F-box domain, other domains include a TIR domain (Toll/interleukin-1 receptor) and a AIG1 (avirulence induced gene 1)-type G domain.

To analyse the evolutionary relationships between the Nictaba-related genes in Arabidopsis and in other species, a phylogenetic tree was made. To have a clear view on the phylogeny of the Nictaba domain, the trees were generated using a multiple sequence alignment (Mega software) for only the Nictaba domain sequences, this to avoid interference of the other (non-lectin) domains with the multiple sequence alignment and phylogenetic tree (RAxML). In the obtained Maximum Likelihood tree, we can observe three main clades, designated as clades A, B and C. The phylogenetic relationship is related to the corresponding protein domain architecture. Clades B and C contain the F-box Nictaba-related genes. Clade A contains separate branches for TIR-Nictaba-related lectins and proteins with only a Nictaba lectin domain.

To investigate the potential involvement of Arabidopsis Nictaba-related lectins in the plant stress response, this project aims to determine the gene expression profiles for different Nictaba orthologues in Arabidopsis thaliana and analyse the conservation of regulatory sequences in the promoter regions of these genes. Therefore a workflow was designed in which we started from the expression profiles from the Nictaba-related genes and defined a set of co-expressed genes. From this set of genes, the gene-of-interest and its co-expressed genes, a gene ontology (GO) enrichment and motif enrichment were produced. The resulting data was visualized in a graph.

To define the expression profile, the project started with a genevestigator search for the 31 Nictaba orthologues in Arabidopsis. We selected microarray experiments from wild type plants for diverse biotic and abiotic stresses (including salicylic acid, methyl jasmonate, abscisic acid, Indole-3-acetic acid (auxin), salt, heat, cold, drought, Pseudomonas, Myzus) and different plant parts. The selection cut-off was set to FC=1.5 and p-value=0.05. Based on these results as well as the organisation of the phylogenetic tree 6 genes with different expression profiles were selected: AT1G80110 (PP2- B11), AT1G31200 (PP2-A9), AT2G02350 (PP2-B9), AT4G19840 (PP2-A1), AT5G52120 (PP2-A14) and AT1G65390 (PP2-A5). The genes PP2- B11, PP2-A14 and PP2-A5 were selected because they show an interesting stress responsive expression profile. For the other 3 genes (PP2-A9, PP2-B9 and PP2-A1) research has been done in our research facility. The 3 different clades of the phylogenetic tree have representatives among the selected genes. PP2-A9 has almost no N- or C-terminal domain and is located in the A-clade together with PP2-A1 (N-terminal domain) and PP2-A5 with a TIR domain. PP2-B11 (F-box) and PP2-B9 (small C-terminal domain) belong to clade B. One gene was selected from clade C: PP2-A14 with an F-box domain linked to a Nictaba domain.

Using their transcription profile, 100 and 200 co-expressed genes for each of the selected Nictaba-related genes were identified (positive and negative correlation). Since it is assumed that co-expressed genes might be involved in the same processes, a GO enrichment analysis was performed, this reveals if a GO term (hierarchical grouping gene descriptions) is significantly more present in the co-expressed genes than in the pool of Arabidopsis genes. In addition, we isolated the sequence 5 kb upstream and 1 kb downstream of the Nictaba-related genes to perform a motif enrichment to identify cis-regulatory elements (and linked transcription factors) enriched in the promoters of these genes.

To give a clear presentation of the data obtained from the GO and motif enrichment, we manipulated the data with Cytoscape, filtering on the q-value value (p-value corrected for the false discovery rate) and the number of hits for the enriched term in the set of the co-expressed genes.

 The results of this analysis provide support to the hypothesis that the Arabidopsis Nictaba orthologues are involved in the stress response pathways of the plant. The expression profiles for most Nictaba-related genes show a stress regulated profile. In addition, the in-depth analysis of the selected genes retrieved similar stress related GO terms and cis-regulatory domains. We have shown that this type of analysis can provide information about the function of the gene of interest.

 
Abstract bachelorproef 2015-2016Identification of new PGPR’S in wheat
Wegens confidentialiteit kan de samenvatting niet gepubliceerd worden.
 
Samenvatting eindwerk 2014-2015: Plant growth promoting influence of rhizobacteria
There are different alternative methods to look for stimulation of plant growth and yield. In this research the focus is on bio-fertilization. Bio-fertilization uses bacteria to protect and stimulate plant growth. Here there is a focus on plant-growth promoting rhizobacteria, which are free-living bacteria that surround and invade the root tissue in the soil.
A screening has been performed for plant growth promotion using a collection of 127 rhizobacteria. Using an in vitro screening in Arabidopsis thaliana, a search for bacteria that stimulate root and/or shoot growth has been performed.
On the long term, the goal of this research topic is to find rhizobacteria that stimulate crop growth in the field. One of these crops is maize and the bacteria selected in this research will be tested on this crop in the future.
 
Samenvatting eindwerk 1 2013-2014Clarification and characterization of secondary metabolites of bioenergy crops
Because of the growing demand of energy and the limited amount of fossil fuel there was a search for alternative energy sources with low impact on the ecology. A possible alternative energy source is the use of energy from biomass from plants. This can be obtained due to saccarification and fermentation of the sugars that are present in the plant. Because of this procedure the sugars can be transformed to bioethanol. The limiting factor in this procedure is lignin. By altering the structure of this lignin there would be a change in the composition and concentration which leads to a higher production of bioethanol.
To accomplish this the whole composition of the plant should be known. Therefore we focus on the metablome of the plant. Young and old cells are analyzed to make sure that the whole composition of the plant is known in different stages. Because of the broad view there is a chance to unravel new biosynthesis pathways from different compounds and to know there function within the plant itself.
This thesis will handle the different ways of identification of these compounds. The clarification of some of these unknown compounds can be done with a CSPP network and then they are still just registered by class: benzenoids, oligolignols, flavonoids or flavonolignans. To make sure that the identification of the compound is correct there has to be a control. This is done by purification and by elucidation of that particular compound. This has the advantage that there can be new fragmentation pathways found for that compound by analyzing the fraction.
During the research there were seventeen new compounds classified with the CSPP network. The fragmentation pathways of seven classified compounds were found. Another seven compounds were identified and had there fragmentation pathways clarified. 
 
Samenvatting eindwerk 2 2013-2014:
Influence of strigolactones on the in vitro regeneration of Arabidopsis thaliana
Up until today, in vitro regeneration of plants remains important to generate a large amount of identical clones of trees and horticultural and agricultural crops at a low cost. The precise mechanisms of plant regeneration are however still not completely understood and a lot of plant species do not regenerate optimally. That is why research of in vitro regeneration processes is very important.
The ‘rhizosphere’ – research group of the VIB recently discovered that a new group of plant hormones, termed strigolactones, have an important role during in vitro regeneration. In Arabidopsis thaliana, mutants in four important genes which are involved in biosynthesis and / or perception of strigolactones (CCD MORE AXILLARY GROWTH (MAX) 3, MAX4, a cytochrome P450 MAX1 and MAX2) display lower regeneration rates than wildtype plants.
The first part of this thesis elaborates on the role of MAX2 during in vitro regeneration. After doing a GUS – staining, we conclude that MAX2 is expressed throughout the plant, with the strongest expression in the developing parts of the root explants. After this, MAX2 has been put under the control of different promoters (APL, NST3, SCR and WOX4), to investigate the influence of specific MAX2 expression on the in vitro regeneration of Arabidopsis thaliana, as well as the influence on the lateral root density (LRD).
Next, qPCR analyses were done to check a MAX2 – overexpression – line. The relative expression of this line turned out to be higher than the relative expression of MAX2 in wildtype, which shows that MAX2 is indeed overexpressed in the overexpression – line. When MAX2 was overexpressed, the in vitro regeneration was much better than the in vitro regeneration of the max 2-1 line, but the percentage of shoot regeneration was nevertheless lower than Col-0 (wildtype), indicating that specific levels of MAX2 are required for optimal in vitro regeneration rates.
In the last part of this thesis, pAPL:GUS – seeds were treated with GR24, a strigolactone analogue. This treatment was compared to MOCK and did not show any difference. The APL – marker is a marker for the vasculature, because of the blue staining which appeared in the vasculature, and was investigated because that is the main place of MAX2 expression. These experiments confirmed that defects in in vitro regeneration are related to MAX2 defects rather than general defects in vascular development.
Altogether, this work has contributed to the understanding of the role of MAX2 during in vitro regeneration, and future work of the research group will focus on this line of research.
 
Samenvatting eindwerk 2010-2011:
Aanpassen van CORNET voor data integratie
 

Deze bachelorproef handelt over het integreren van transcriptiefactor – target data van de modelplant Arabidopsis thaliana en het ontwikkelen van een tool voor het visualiseren van deze data in de reeds bestaande CORNET webtool. CORNET is een webtool die netwerken bouwt en visueel weergeeft door de integratie van verschillende data types. Dit gebeurt door twee tools op de CORNET site, de proteïne-proteïne interactie en de co-expressie tool.

Om dit te bereiken wordt eerst de structuur van de CORNET databank aangepast om alle nodige informatie in deze databank te kunnen integreren. Hierop volgend wordt de data ingebracht in de databank.
Eenmaal de databank de nodige informatie bevat, wordt de tf tool ontwikkeld. Dit wordt een nieuwe tool additioneel aan de reeds bestaande tools op de site.
De output van de tf tool gebeurt in tekstformaat of gevisualiseerd in Cytoscape. In Cytoscape wordt het resultaat visueel weergegeven als een netwerk van een of meerdere TF en hun targets.
Deze visualisatie geeft niet alleen de interacties weer tussen de transcriptiefactor en targetgen maar ook de regulatie die de transcriptiefactor uitoefent op het targetgen en de betrouwbaarheid van de data.
Na de ontwikkeling van de tf tool werd deze geïntegreerd met de proteïne-proteïne interactie en co-expressie tool, zodat het resultaat van een tool verder aangevuld kan worden met informatie uit een andere tool.
Naar de toekomst toe is er nog de mogelijkheid om andere datatypes te integreren in CORNET. Maar ook analoge informatie van andere organismen kunnen toegevoegd worden, zoals de economisch en agronomisch interessante maïs, rijst en populier. Zo kan er vergeleken worden of de moleculen en de interacties ertussen specifiek zijn voor een bepaald organisme of  voorkomen in meerdere organismen.

Address

Technologiepark 927
9052 Zwijnaarde
Belgium

Contacts

Traineeship supervisor
Steven Maere
steven.maere@psb.vib-ugent.be
Traineeship supervisor
Dr. Klaas Vandepoele (BIT)
klaas.vandepoele@psb.vib-ugent.be
Traineeship supervisor
Dr. Lieven Sterck
+32 (0)9 3313821
lieven.sterck@psb.vib-ugent.be
Traineeship supervisor
Kris Morreel (FBT)
Traineeship supervisor
Stephen Depuydt
Traineeship supervisor
Belen Marquez
Traineeship supervisor
Tom Viaene (FBT)
Traineeship supervisor
Sofie Goormachtig (FBT)
Traineeship supervisor
Michiel Van Bel (BIT)
Traineeship supervisor
Patrick Willems (BIT)
patrick.willems@ugent.be
Zoekopdracht
Klassiek
Via Map