ILVO Melle, plant
Abstract advanced bachelor of bioinformatics 2019-2020: Effects of tropical rainforest disturbance on genomic diversity in wild Robusta coffee
The research question for this project is the analysis of the effects of tropical rainforest disturbance on genomic diversity in wild Robusta coffee. To be able to complete the advanced bachelor in bio-informatics, this project specific focuses on the execution and optimization of the data processing of Genotype By Sequencing (GBS) DNA data and the further analysis of this data.
The workflow starts with pre-processing the data using different tools and steps in combination with python and R as programming languages. After pre-processing, the data is mapped to the reference genome using PEAR software. The continuation of the workflow is a newly developed pipeline by ILVO itself which is not yet published. The name of the tool is called SMAP (Stack Mapping Anchor Points) and SMAP delineate is performed after the mapping. This is a process to identity read mapping polymorphisms and to create final stack positions. The next step is to identify SNP’s based on the final stack positions, this is done by SNP calling. The SNP calling is performed by the Genome Analysis Toolkit (GTAK). After the SNP calling, the SNP’s need to be filtered based on different quality parameters. After SMAP delineate, SMAP haplotype is performed which is a read-backed haplotyping technique that is used to identify the haplotypes in the different SMAP regions. The SNP’s are filtered based on the haplotypes using Bedtools.
After the processing of the GBS data, the different population structure analysis can be executed. This starts with creating Principal Component Analysis (PCA) plots to have a look at the distribution of the different populations through the samples. Another analysis is the genetic differentiation between subpopulations and is performed by calculating the Fst values for the different SNP’s. The goal for this analysis is to find interesting SNP’s that can be used in further analysis like amplicon sequencing.
To give an answer to the main research question, a lot more analysis and research need to be done in the next couple of years. To give an answer to the additional research question, the workflow of GBS is completely optimized and set up so that in the future the processing of the GBS data can run smoothly. By working on the population structure analysis, a list of interesting SNP’s is set up and will be used in the future for further analysis.
Abstract Bachelor Project FBT 2018-2019: The new “Surf and Turf”: crustacean waste to promote lettuce growth
The European “Interreg 2 Seas Horti-BlueC” project is looking for environment-friendly alternative plant protection products and nutrients for greenhouse horticulture. Chitin is a promising alternative potting soil amendment. This key component of crustacean waste has previously been shown to have a positive impact on plant growth and development. However, the mode of action of chitin for this growth promotion is still unclear. Chitinolytic organisms, such as the fungus Mortierella, may play an important role.
In this research, the effect of chitin on lettuce plant growth is investigated with a pot experiment. Lettuce plants were either grown in a 2% chitin amended or unamended peat-based growing medium for 7 weeks. Simultaneously, the presence of Mortierella is determined by Sanger Sequencing of the ITS-region and a first experiment in chitinolytic activity of the fungus is done.
This research showed that chitin causes a remarkable growth promotion of lettuce. This effect is linked to an increased mineral nitrogen concentration in the growing medium and the plants. Also the chlorophyll and magnesium content in the lettuce significantly increases after chitin addition. As expected, there is also a clear increase in the presence of Mortierella, more specific Mortierella hyalina species, in the chitin mixtures. This fungus is known as a growth stimulator of Arabidopsis and is shown to play a role in the calcium pathway, one of the major components of chitin. An effect in chitin degradation through chitinase activity however is not shown. The growth stimulation of plants by chitin is therefore possibly linked to the presence of Mortierella hyalina and the release of nitrogen into the growing medium, which will be investigated further in future experiments.
Abstract advanced bachelor of bioinformatics 2018-2019 (1): The strength of microbial community analysis – a metabarcoding approach
One of the projects I worked on is “BI-O-PTIMAL@WORK”. In this project different compost and management residue treatments were used. Samples were taken from these different compost and management residues and sequenced with Illumina sequencing. 26 of these samples I analyzed. Eight of which were management residues and 18 which were composts. Compost are to be used for plants preferring a ground with neutral pH, while management residues are more for plants preferring a ground with a more acid pH. By looking at the biological and biochemical data, it can be determined which composts and management residues would be the best candidates.
This sequencing data was first processed with the Bioconductor package “DADA2”, where we go from Illumina sequencing data to an ASV (=amplicon sequence variant) table with assigned taxonomy. The DADA2 pipeline goes as follows: - Before starting the actual DADA2 pipeline, the primers and, in case of Fungi, also the adapters are removed. This is done with a bash script and run on the genomics server from ILVO. - After removing primers (and adapters) the quality of the sequences is visualized. This can be done on the command line with FastQC or with the DADA2 plotQualityProfile. - Then the data was filtered with filterAndTrim. Here the parameters maxEE (=maximum allowed error rate, 3 for forward and reverse of bacteria and 2 for forward and reverse of fungi) and, in case of Bacteria, truncLen (reads shorter than this are removed, based on from what length the quality drops below 20, 263 for forward and 225 for reverse) are important. - With learnErrors the error rate is estimated. Here the parameters nbases (=minimum number of bases to use for error rate learning, 1e8 for both bacteria and fungi) and MAX_CONSIST (=maximum number of times to run through the self-consistency loop, 25 for bacteria and 10 for fungi) are important. With plotErrors these error estimations can be visualized. - The next step is dereplication. The sequences that are not unique, in other words duplicate sequences, are removed. - Then a sequence table is constructed. If there are multiple runs, these sequence tables can be merged. - Chimera removal is done with removeBimeraDenovo. Chimeras are reads which originate from more than one coding sequence. During the process of sequencing it is possible that two incomplete pieces are brought together and form an artificial sequence. - Taxonomy is assigned with assignTaxonomy. On kingdom, phylum, class, order, family, genus and species level the taxonomy is assigned to the sequences. - Finally a count table is made. This contains per treatment, the amount of times a sequence is detected. This count table is then used to in the data analysis. Going from data exploration to statistical testing and visualization of the taxonomy abundances. The results of this data analysis are as follows: - In the data exploration we see some variation between the different treatments. The density plots show a normal distribution for bacterial data, but a left skew for fungi data. - When determining the richness, evenness and diversity we see some difference between the treatments. The difference is bigger for the fungi data. - The permanova test is preceded by a betadisper. This determined that the variation in data between the treatments is significant. The permanova test is also significant, but because the betadisper result is significant, we can’t with certainty say that the significance is because of the biological difference between the treatments. - In the PCoA plots with the biochemical data we see four clusters, two of which are entirely composed of composts, one which is primarily composts and one that is primarily management residues. Looking at the biochemical data we see that the compost clusters are determined by nitrogen-related parameters and carbonates for bacteria and because of potassium and phosphor instead of carbonates for fungi. For fungi we also see that the management residues and a select number of composts are also determined by cellulose and lignin. - In the phylum and family barplots, the only big difference we see is the presence of the phylum “Firmicutes” and its families in composts. Most likely because of the way compost is made compared to management residues. - After calculating the similarity between the different treatments and visualizing it in a heatmap, we can see that overall the similarity is very low. Mainly for bacteria, the clustering is similar to the PCoA plot, for fungi not so much. Final conclusion: the different composts and management residues are definitely very different, with the statistical test we can’t say for certain that the difference is because of the biological data. In the PCoA we see a clear clustering of the different composts and management residues based on chemical data. In the future this test will done again but on different timepoints to see if they are stable.
Abstract advanced bachelor of bioinformatics 2018-2019 (2): The hurdle of Metatranscriptomics
The soil microbiome carries out fundamental soil processes. Agricultural practices, such as tillage, can however threaten this important biome. The BOPACT (ImPACT van Compost en Teeltechniek op BOdem en PAthogenen) field is an ongoing fieldtrial of ILVO that started in 2010 to understand the effects of agricultural practices (compost application, tillage practices and slurry application) on the soil quality. Our understanding on limited alterations such as compost application on the soil microbiome is still limited. Therefore, data of the BOPACT fieldtrial of ILVO was analysed to study the effect of compost addition in soil on the bacterial composition, which can be a soil quality measurement, and their functions.
Analysis of the phospholipid-derived fatty acids (PLFAs) are used to determine the absolute biomass of the living bacterial and fungal cells. With this technique, the fatty acid part of the phospholipids that make up the cellular membrane of viable cells is targeted. For the Bopact data, PLFA analysis showed a 50% increase in absolute biomass. PLFA analysis has however a limited taxonomy assignment. To have a deeper knowledge in the bacterial and fungal taxonomic groups present in the soil, metabarcoding can be used. Metabarcoding targets a specific gene in the genomes. For bacteria the 16s rRNA V3&V4 gene region is targeted. The generated data provides simultaneous the identification of the community composition within the samples and their relative abundance. After analyzing the data we saw that the samples of the compost treatment contained more bacteria for some families. Depending on the r package that we used (DESeq or edgeR, both from bioconductor) we saw an increase in numbers for between 6 and 17 families With these two techniques we can determine which micro-organisms absolutely and relatively change after compost application, but to know how the active functions of the micro-organisms changes a technique on gene level is necessary. Metatranscriptomics was used to study the gene expression of the micro-organisms in the BOPACt samples. By using this technique we can determine which enzymes and/or pathways are more prevalent in the bacteria when our samples are exposed to a change in environment. We compared three different analysis methods to see which ones was the most useful for soil organisms. After we received our sequencing data we used local and web-based tools and a combination to analyze the dataset. The local tool was a pipeline developed at ILVO, combing fastqc, trimmomatic, pear and sortmerna and the humann2 analysis pipeline. The web-based tools were MG-RAST, COMAN and MGnify from EBI. The combination that we used was the combination of the locally developed pipeline and MG-RAST. Our local approach gave us data that was annotated to level 4 enzyme commission numbers via uniprot 50. For the web-based tools MG-RAST have us a couple of choices like taxa or functions to analyze, COMAN used the KEGG data base and NCBI’s COG database, Mgnify used of course the EBI databases. All these databases contain primarily human bacteria. When comparing the results of all these tools making use of the edgeR differential expression analysis tool, we saw that there were not a lot of significant changes when our samples were treated with compost. Later we received data from 2 other timepoints and again we saw no significant changes when comparing our control and compost treated samples. When started to compare our different timepoints, we saw that for multiple enzymes (we used the local tool) there was a significant increase or decrease over the different timepoints. Conclusion: When we compare soil treated with compost we see that the absolute numbers of microorganisms increases, while relative numbers only changes slightly. As for the metatranscriptomics analysis, we saw no effect after treatment but we saw that the expression of enzymes is significantly different between samples taken at different timepoints. The problem with metatranscriptomics is that the current databases that are used to annotate our sequences were based on a microbiome that is more researched than others, the microbiome in the human body.
Abstract traineeship advanced bachelor of bioinformatics 2017-2018: Soil metatranscriptomics: denoising, binning and analysis
During a period of five years, ILVO gained experience in constructing and analyzing (meta)genome libraries. So far, we are using several genome-sequencing techniques such as genotyping by sequencing, RNA-Sequencing and metabarcoding to answer biological-related questions. Within the field of soil and substrate, we are mainly interested in the microbial community and its influence on plant growth and disease resistance, and soil quality. So far, we studied these communities by metabarcoding (bacteria, fungi) and started to use whole genome shotgun sequencing to explore both the taxonomical composition and functional potential of the soil’s microbial community.
However, to have insight in the active functions of the microbial community, one should look into RNA. So far, the number of studies on soil and potting soil using metatranscriptomics, the study of all microbial RNA in the soil, is very limited. This is not only due to difficulties in RNA extraction, but especially in limitations and difficulties in data analysis.
At ILVO, we performed a metatranscriptomics experiment in which ten samples are studied. Six samples were taken from a field experiment at ILVO (soil). Here we tested statistically if there is an effect of compost amendment to the microbial functionality. The other four samples were derived from pots where strawberry plants were grown in for thirteen weeks (potting soil and rhizosphere samples).
Within the project, I have explored the metatranscriptome libraries from primary data exploration until statistical analysis.
- I looked into literature for the most commonly used, new methods and tested these for primary data analysis (e.g. QC, sequence trimming, read merging, rRNA removal, etc.). The optimal methods for this experiment have been executed, evaluated and selected, and written in a linux shell script for automation and reproducibility, that uses multithreading to speed up analysis. Gene families and pathways were characterized for all samples. Normalisation and differential expression analysis are underway.
- Specific pathways and gene expression profiles were analysed using HMMer for nitrogen cycling, methanogenesis, carbon cycling and lignine degradation. R statistical analysis included normalization and statistical analysis of differences using by applying a quasipoisson generalized linear model.
- I evaluated web-based tools for analysis of metatranscriptome datasets such as EBI metagenomics, MG-RAST and COMAN. In addition,I pipelines such as Humann2 (UniRef and Metacyc mapping) were evaluated. The pipelines and different databases and dependencies were installed on a HPC cluster for functional characterization and the preprocessed reads were mapped against the UniRef protein databases. These hits were mapped against the MetaCyc pathway database.
- RNA viruses were detected in the soil samples using the VirusDetect pipeline against bacterial, fungal, invertebrate and plant RNA virus reference databases.
- As a side-project, the same samples were analysed taxonomically using bacterial 16S v3-V4 rRNA and fungal ITS2 metabarcoding and analysed using the Bioconductor package DADA2.
Caroline De Tender