| Page 2 | BLT Stages

Faculty of Pharmaceutical Science, Khon Kaen University, Khon Kaen 40002, Thailand

Faculty of Science, University of Zagreb

Finland, Faculty of Veterinary Medicine, Departement Food and environmental hygiene

Forschungszentrum Jülich - Institute of Bioorganic Chemistry (IBOC), Duitsland

IMGT, Institut de Génétique Humaine du CNRS, UPR 1142

INRA UMR SPO Montpellier/ Centre de Recherches de Biochimie Macromoléculaire (CRBM) - Centre National de la Recherche Scientifique (CNRS)

Institut Hospital del Mar de Investigaciones Medicas, Barcelona

Institut Pasteur de Lille

Universiteit Maastricht afdeling BiGCaT-Bioinformatics

Contact details

Traineeship proposition

Abstract

Testimony

Admin

Abstract advanced bachelor of bioinformatics 2019-2020: Building an automated workflow for RNA-seq data for determining gene-specific isoform expression

Recently large genome-scale studies presumed that practically all human multi-exon genes could be spliced into numerous transcript isoforms. There are 58,037 annotated human genes and 198,093 isoforms in Gencode v25. On average, there are 3.4 annotated transcripts per human gene and if just protein-coding genes are thought of, the ratio increases to 7:1. In any case, the quantity of annotated transcripts doesn't completely represent the complexity of all alternative splicing events in cells. Novel transcripts are regularly found by RNA sequencing (RNA-seq), which enables the detection of transcript isoforms, gene fusions, single nucleotide variants, and other features without the limitation of prior knowledge. This all leads us to my research question. It is as follows: How effective will the building of an automated workflow help to determine gene-specific isoform expression based on RNA-seq data?

RNA-seq has emerged as a powerful transcriptome profiling technology that allows in-depth analysis of alternative splicing. In a typical RNA-seq assay, extracted RNAs are reverse transcribed and fragmented into cDNA libraries, which are sequenced by high throughput sequencers. Transcript isoforms coming from the same gene are highly similar in sequence and share a large percentage of overlapping regions. It is, therefore, a challenging task to identify the true origin of the short sequencing reads, given that reads from overlapping regions can come from any of the transcript isoforms.

To be able to align raw reads to a reference genome/transcriptome many tools can be used. For my project I will use Spliced Transcripts Alignment to a Reference (STAR). It is a software package and enables highly accurate and ultra-fast alignment of RNA-seq reads to a reference genome. In addition it can detect annotated and novel splice junctions. STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. It can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies.

After mapping it is also necessary to choose the right quantification tool/package. A number of packages have been developed to quantify expression at the transcript level. With this project I want to concentrate on the RNA-Seq by Expectation-Maximization (RSEM) package. RSEM implements iterations of Expectation-Maximization algorithms to assign reads to the isoforms from which they originate.

By using these tools/packages in an automated workflow, we want to make the determination of isoform expression more easy, understandable and well-visualized. The image on the second page illustrates the pipeline we are following.

Bibliography

Tags: bioinformatics

Address

Universiteitssingel 50

6229 ER Maastricht

Netherlands

Contacts

Traineeship supervisor

Lars Eijssen

0031433881187

l.eijssen@maastrichtuniversity.nl

Zoekopdracht

Klassiek

Via Map

BLT Stages

Traineeship / bachelor project

Pages

Universiteit Maastricht afdeling BiGCaT-Bioinformatics

Address

Contacts

Traineeship / bachelor project

Search form

Pages

Universiteit Maastricht afdeling BiGCaT-Bioinformatics

Address

Contacts