ILVO Melle, plant
Abstract advanced bachelor of bioinformatics 2018-2019 (1): The strength of microbial community analysis – a metabarcoding approach
One of the projects I worked on is “BI-O-PTIMAL@WORK”. In this project different compost and management residue treatments were used. Samples were taken from these different compost and management residues and sequenced with Illumina sequencing. 26 of these samples I analyzed. Eight of which were management residues and 18 which were composts. Compost are to be used for plants preferring a ground with neutral pH, while management residues are more for plants preferring a ground with a more acid pH. By looking at the biological and biochemical data, it can be determined which composts and management residues would be the best candidates.
This sequencing data was first processed with the Bioconductor package “DADA2”, where we go from Illumina sequencing data to an ASV (=amplicon sequence variant) table with assigned taxonomy. The DADA2 pipeline goes as follows: - Before starting the actual DADA2 pipeline, the primers and, in case of Fungi, also the adapters are removed. This is done with a bash script and run on the genomics server from ILVO. - After removing primers (and adapters) the quality of the sequences is visualized. This can be done on the command line with FastQC or with the DADA2 plotQualityProfile. - Then the data was filtered with filterAndTrim. Here the parameters maxEE (=maximum allowed error rate, 3 for forward and reverse of bacteria and 2 for forward and reverse of fungi) and, in case of Bacteria, truncLen (reads shorter than this are removed, based on from what length the quality drops below 20, 263 for forward and 225 for reverse) are important. - With learnErrors the error rate is estimated. Here the parameters nbases (=minimum number of bases to use for error rate learning, 1e8 for both bacteria and fungi) and MAX_CONSIST (=maximum number of times to run through the self-consistency loop, 25 for bacteria and 10 for fungi) are important. With plotErrors these error estimations can be visualized. - The next step is dereplication. The sequences that are not unique, in other words duplicate sequences, are removed. - Then a sequence table is constructed. If there are multiple runs, these sequence tables can be merged. - Chimera removal is done with removeBimeraDenovo. Chimeras are reads which originate from more than one coding sequence. During the process of sequencing it is possible that two incomplete pieces are brought together and form an artificial sequence. - Taxonomy is assigned with assignTaxonomy. On kingdom, phylum, class, order, family, genus and species level the taxonomy is assigned to the sequences. - Finally a count table is made. This contains per treatment, the amount of times a sequence is detected. This count table is then used to in the data analysis. Going from data exploration to statistical testing and visualization of the taxonomy abundances. The results of this data analysis are as follows: - In the data exploration we see some variation between the different treatments. The density plots show a normal distribution for bacterial data, but a left skew for fungi data. - When determining the richness, evenness and diversity we see some difference between the treatments. The difference is bigger for the fungi data. - The permanova test is preceded by a betadisper. This determined that the variation in data between the treatments is significant. The permanova test is also significant, but because the betadisper result is significant, we can’t with certainty say that the significance is because of the biological difference between the treatments. - In the PCoA plots with the biochemical data we see four clusters, two of which are entirely composed of composts, one which is primarily composts and one that is primarily management residues. Looking at the biochemical data we see that the compost clusters are determined by nitrogen-related parameters and carbonates for bacteria and because of potassium and phosphor instead of carbonates for fungi. For fungi we also see that the management residues and a select number of composts are also determined by cellulose and lignin. - In the phylum and family barplots, the only big difference we see is the presence of the phylum “Firmicutes” and its families in composts. Most likely because of the way compost is made compared to management residues. - After calculating the similarity between the different treatments and visualizing it in a heatmap, we can see that overall the similarity is very low. Mainly for bacteria, the clustering is similar to the PCoA plot, for fungi not so much. Final conclusion: the different composts and management residues are definitely very different, with the statistical test we can’t say for certain that the difference is because of the biological data. In the PCoA we see a clear clustering of the different composts and management residues based on chemical data. In the future this test will done again but on different timepoints to see if they are stable.
Abstract advanced bachelor of bioinformatics 2018-2019 (2): The hurdle of Metatranscriptomics
The soil microbiome carries out fundamental soil processes. Agricultural practices, such as tillage, can however threaten this important biome. The BOPACT (ImPACT van Compost en Teeltechniek op BOdem en PAthogenen) field is an ongoing fieldtrial of ILVO that started in 2010 to understand the effects of agricultural practices (compost application, tillage practices and slurry application) on the soil quality. Our understanding on limited alterations such as compost application on the soil microbiome is still limited. Therefore, data of the BOPACT fieldtrial of ILVO was analysed to study the effect of compost addition in soil on the bacterial composition, which can be a soil quality measurement, and their functions.
Analysis of the phospholipid-derived fatty acids (PLFAs) are used to determine the absolute biomass of the living bacterial and fungal cells. With this technique, the fatty acid part of the phospholipids that make up the cellular membrane of viable cells is targeted. For the Bopact data, PLFA analysis showed a 50% increase in absolute biomass. PLFA analysis has however a limited taxonomy assignment. To have a deeper knowledge in the bacterial and fungal taxonomic groups present in the soil, metabarcoding can be used. Metabarcoding targets a specific gene in the genomes. For bacteria the 16s rRNA V3&V4 gene region is targeted. The generated data provides simultaneous the identification of the community composition within the samples and their relative abundance. After analyzing the data we saw that the samples of the compost treatment contained more bacteria for some families. Depending on the r package that we used (DESeq or edgeR, both from bioconductor) we saw an increase in numbers for between 6 and 17 families With these two techniques we can determine which micro-organisms absolutely and relatively change after compost application, but to know how the active functions of the micro-organisms changes a technique on gene level is necessary. Metatranscriptomics was used to study the gene expression of the micro-organisms in the BOPACt samples. By using this technique we can determine which enzymes and/or pathways are more prevalent in the bacteria when our samples are exposed to a change in environment. We compared three different analysis methods to see which ones was the most useful for soil organisms. After we received our sequencing data we used local and web-based tools and a combination to analyze the dataset. The local tool was a pipeline developed at ILVO, combing fastqc, trimmomatic, pear and sortmerna and the humann2 analysis pipeline. The web-based tools were MG-RAST, COMAN and MGnify from EBI. The combination that we used was the combination of the locally developed pipeline and MG-RAST. Our local approach gave us data that was annotated to level 4 enzyme commission numbers via uniprot 50. For the web-based tools MG-RAST have us a couple of choices like taxa or functions to analyze, COMAN used the KEGG data base and NCBI’s COG database, Mgnify used of course the EBI databases. All these databases contain primarily human bacteria. When comparing the results of all these tools making use of the edgeR differential expression analysis tool, we saw that there were not a lot of significant changes when our samples were treated with compost. Later we received data from 2 other timepoints and again we saw no significant changes when comparing our control and compost treated samples. When started to compare our different timepoints, we saw that for multiple enzymes (we used the local tool) there was a significant increase or decrease over the different timepoints. Conclusion: When we compare soil treated with compost we see that the absolute numbers of microorganisms increases, while relative numbers only changes slightly. As for the metatranscriptomics analysis, we saw no effect after treatment but we saw that the expression of enzymes is significantly different between samples taken at different timepoints. The problem with metatranscriptomics is that the current databases that are used to annotate our sequences were based on a microbiome that is more researched than others, the microbiome in the human body.
Abstract traineeship advanced bachelor of bioinformatics 2017-2018: Soil metatranscriptomics: denoising, binning and analysis
During a period of five years, ILVO gained experience in constructing and analyzing (meta)genome libraries. So far, we are using several genome-sequencing techniques such as genotyping by sequencing, RNA-Sequencing and metabarcoding to answer biological-related questions. Within the field of soil and substrate, we are mainly interested in the microbial community and its influence on plant growth and disease resistance, and soil quality. So far, we studied these communities by metabarcoding (bacteria, fungi) and started to use whole genome shotgun sequencing to explore both the taxonomical composition and functional potential of the soil’s microbial community.
However, to have insight in the active functions of the microbial community, one should look into RNA. So far, the number of studies on soil and potting soil using metatranscriptomics, the study of all microbial RNA in the soil, is very limited. This is not only due to difficulties in RNA extraction, but especially in limitations and difficulties in data analysis.
At ILVO, we performed a metatranscriptomics experiment in which ten samples are studied. Six samples were taken from a field experiment at ILVO (soil). Here we tested statistically if there is an effect of compost amendment to the microbial functionality. The other four samples were derived from pots where strawberry plants were grown in for thirteen weeks (potting soil and rhizosphere samples).
Within the project, I have explored the metatranscriptome libraries from primary data exploration until statistical analysis.
- I looked into literature for the most commonly used, new methods and tested these for primary data analysis (e.g. QC, sequence trimming, read merging, rRNA removal, etc.). The optimal methods for this experiment have been executed, evaluated and selected, and written in a linux shell script for automation and reproducibility, that uses multithreading to speed up analysis. Gene families and pathways were characterized for all samples. Normalisation and differential expression analysis are underway.
- Specific pathways and gene expression profiles were analysed using HMMer for nitrogen cycling, methanogenesis, carbon cycling and lignine degradation. R statistical analysis included normalization and statistical analysis of differences using by applying a quasipoisson generalized linear model.
- I evaluated web-based tools for analysis of metatranscriptome datasets such as EBI metagenomics, MG-RAST and COMAN. In addition,I pipelines such as Humann2 (UniRef and Metacyc mapping) were evaluated. The pipelines and different databases and dependencies were installed on a HPC cluster for functional characterization and the preprocessed reads were mapped against the UniRef protein databases. These hits were mapped against the MetaCyc pathway database.
- RNA viruses were detected in the soil samples using the VirusDetect pipeline against bacterial, fungal, invertebrate and plant RNA virus reference databases.
- As a side-project, the same samples were analysed taxonomically using bacterial 16S v3-V4 rRNA and fungal ITS2 metabarcoding and analysed using the Bioconductor package DADA2.
Caroline De Tender