KULeuven, VIB Nucleomics core
Abstract Traineeship 2020-2021: DEVELOPMENT OF A WORKFLOW FOR PRE-PROCESSING SINGLE-CELL RNA SEQUENCING DATA
The Nucleomics Core is a VIB core facility providing sequencing services for the scientific community. The goal of this traineeship project was to develop a pipeline for performing single cell data analysis, starting from the raw data generated by the Illumina instruments located in the Nucleomics Core. In my project I focused on 10X single cell RNA-sequencing data in order to automate the data pre-processing for different customers and experiments.
A workflow was created to demultiplex the raw sequencing data, generate summary plots and tables to assess the pooling quality, map the reads to a reference genome and count the reads, and summarize the results for all the samples per experiment. Finally, the summary precomputes all information needed for re-pooling after a shallow sequencing run according to the target sequencing depth.
The VIB Nucleomics Core makes use of a private GenePattern server. GenePattern provides a user friendly web-interface for running programs and is used by the bioinformaticians at the core facility to run routine analysis, but also by wet-lab technicians.
The workflow was integrated in the GenePattern server. A GenePattern module was created for each workflow step. All modules can then be linked together in GenePattern to form a pipeline. The pipeline can also be used on the command line as a Nextflow script. Nextflow allows to process the reads of multiple samples in parallel to reduce the analysis runtime.
The workflow is easily portable to different servers. To facilitate this, a conda environment was created containing all R-packages used in the R-scripts to easily manage dependencies in different environments. The portability of the workflow is important for the Nucleomics core in order to use the pipeline on virtual machines in the near future.
At the end of the traineeship period, I was able to successfully create a user friendly and portable workflow for pre-processing all single-cell RNA-sequencing run data, which can successfully be automated and can be used on command line. This workflow will be very useful for the development of the sequencing activity of the Nucleomics core.