| Page 5 | BLT Stages

University College Cork, The LAboratory of Post-Transcriptional control and bioInformatics (LAPTI), Cork, Ireland

Contact details

Traineeship proposition

Abstract

Testimony

Admin

Abstract internship (advanced bachelor of bioinformatics) 2016-2017: Upgrade of the GWIPS-viz ribo-seq genome browser

The goal of my internship is to upgrade the GWIPS-viz Ribo-seq genome browser.

GWIPS-viz (Genome Wide Information on Protein Synthesis) is a freely available online genome browser for analysing ribosome profiling (Ribo-seq) data. The data is generated by deep sequencing ribosome protected mRNA fragments. This makes it possible to quantify the mRNA transcripts that are actively being translated in the cell.

GWIPS-viz (http://gwips.ucc.ie) is a customized version of the UCSC (University of California Santa Cruz) Genome Browser specifically tailored for Ribo-seq data.

There are two major aspects of this internship: First the integration of the latest functionalities from the UCSC Genome Browser and second to guarantee that future upgrades will be more seamless and require less time investment.

Since the last GWIPS-viz upgrade in September 2015, the UCSC Genome Browser development team have added new functionalities like the “multi-region view” which would be particularly beneficial for exploring the Ribo-seq data hosted on GWIPS-viz. This functionality makes it possible to automatically exclude the display of introns so that only exonic regions are displayed in the genome browser window. This results in a more convenient display for Ribo-seq data, because Ribo-seq data is not expected to be found in intronic regions.

For the second aspect of this internship, I will be focussing on how to guarantee that future updates can be executed in a faster way than in the previous releases. There are two ways to look at an upgrade of a mirrored site. One way is to modify the existing GWIPS-viz browser to provide the same functionalities like the UCSC Genome Browser. The second way is to modify the UCSC Genome Browser C source code to look more like the GWIPS-viz browser. The first approach is more likely to insure that future updates are quicker.

The upgrade process is not a one-step process. There will be a lot of testing required, analysing of failures and implementation of modifications to accomplish the upgrade to the latest version of the GWIPS-viz browser.

Abstract 2018-2019: To update and automate the integration of NCBI and Gencode gene annotations for the GWIPS-viz browser (https://gwips.ucc.ie/ (Links to an external site.)Links to an external site.

The GWIPS-viz browser (https://gwips.ucc.ie/) is an on-line genome browser specifically tailored for exploring ribosome profiling (Ribo-seq) data and their RNA-seq controls. As GWIPS-viz is a partial mirror browser of the UCSC Genome Browser, its functionality is very similar, with some additional functionality for exploring Ribo-seq data. For the genomes that are common between GWIPS-viz and the UCSC Genome Browser (hg38, mm10, sacCer3, rn6) the gene annotations (UCSC RefSeq) were previously automatically updated in GWIPSviz by in-house scripts. However, changes in the configuration of these gene annotations on the UCSC Genome Browser resulted in the GWIPS-viz automated update pipeline to be broken. The aim of my project was to troubleshoot the causes and then try to resolve the issues by modifying and updating the existing scripts so that the gene annotations will automatically update in future. After exploring the previous automation scripts and the current UCSC RefSeq gene annotations configuration on the UCSC Genome Browser, I found that SQL database tables had substantially changed from the previous configuration. Hence, rather than trying to update the previous configuration, I familiarised myself with the recent NCBI RefSeq and Gencode annotation tables configuration and downloaded the latest versions of the NCBI RefSeq and Gencode annotation tables. Next, I wrote new automation scripts for the NCBI RefSeq and the Gencode annotations so both sets of gene annotations get updated automatically.

Abstract traineeship advanced bachelor of bioinformatics 2017-2018: RiboSeq Unit Step Transformation (RUST) of bacterial data

The ribosome profiling (RiboSeq) technique was developed in 2009 to assess gene expression at the translation level at the scale of the entire cell transcriptome [1]. The ability of the technique to produce unprecedentedly detailed quantitative characterization of gene expression on a global scale has made it popular. The applications of the technique prompted researchers to reconsider the current views on how protein coding information is organized in the genomes and on how protein synthesis is carried out and regulated in the cells [2-4]. However, ribosome profiling data are highly heterogeneous and difficult to analyze due to the presence of sporadic technical noise. A normalization technique (RUST) was recently developed at LAPTI lab, University College of Cork. RUST allows for circumvention of these problems and can be applied to RiboSeq data obtained in eukaryotic organisms [5]. RUST can be used for assessing the quality of datasets and estimating how properties of mRNA affect the speed of elongating ribosomes. RUST profiles for many of the eukaryoytic RiboSeq datasets are available on GWIPS-Viz (https://gwips.ucc.ie)[6]. This is a UCSC genome browser based tool for the analysis ad visualization of RiboSeq data obtained with ribosome profiling technique.

Because of intrinsic differences between bacterial and eukaryotic translation, in particular because of widespread mRNA-rRNA interactions [7], direct application of RUST to bacterial RiboSeq data is not practical. Therefore, the first goal of the project was to adapt RUST for bacterial data. RUST profiles using a static 3’ offset of 12 nucleotides for all read lengths were generated for 25 bacterial datasets (E.coli, etc) and compared to RUST profiles where a variable three prime offset for each individual read length were generated for the same datasets. No large improvements by the use of a variable offset was observed.

A broader distribution of reads, particularly a higher number of short reads (<25nt) was observed for bacterial RiboSeq datasets compared to eukaryotic RiboSeq datasets. Therefore RUST was tested on a dataset where the reads had a minimal read length of 15nt and compared to the in-house default setting of minimal read length of 25nt when mapped to the reference genome. The number of mapped reads were larger but the RUST profiles did not have any distinct changes. The decision was made to use a static 3’ offset of 12nt and keep the default minimal read length of 25nt. Hence remapping all the bacterial RiboSeq datasets for 25studies was considered unnecessary.

After completion of the exploratory analysis, RUST was applied to all the bacterial datasets (25) in GWIPS-viz . The users can explore the profiles in a similar way to what was already available for eukaryotic datasets.

Figure 1 shows an example of a RUST profile. It consists of 3 different plots, the metagenefootprint profile with a Kullback-Leibler divergence (top), a plot with relative coding rates (bottom left) and a panel that shows the triplet periodicity (bottom right).

In the top panel, each gray line represents a codon and it's RUST ratio, this is a log scale of the normalized observed to expected RUST ratio. The 0 coordinate is the A site of the ribosome and the gray area shows the footprint of the ribosome. The Kullback-Leibler divergence at a particular position indicates the variation in Ribo-seq footprint occupancy across all sense codons. The higher the Kullback-Leibler divergence, the less uniform the distribution of RUST values is in the corresponding position. Two codons can influence each other, therefore the adjacent Kullback-Leibler divergence is shown as well. Ideally, the highest divergence should be at the A-site representing that it is the de-coding center of the ribosome that contributes to the highest divergence in footprint occupancy.

In the bottom left panel, the relative RUST ratio for each amino acid is presented. This plot was optimized during the project. It is now possible to distinguish the different synonymous codons for each amino acid.

The panel in the bottom right was also modified to include reads from 36 to 40 nucleotides.

Samenvatting eindwerk 2012-2013: Development of GWIPS-viz genome browser

Ribosome profiling is a new technique to analyse gene expression in both prokaryotes and eukaryotes. Ribosome profiling eperiments provide a huge amount of genome wide information on protein synthesis (GWIPS).

This bachelor project involves a part of the development of an online GWIPS-viz genome browser, along the lines of the UCSC genome browser. The browser allows users to browse ribosome profiling data on human, mouse, zebrafish, nematode and yeast. This report describes the ribosome profiling technique, the GWIPS-viz genome browser, the alignment pipeline an track development.

Tags: bioinformatics

Address

Western Gateway Building, School of Biochemistry and Cell Biology

Cork

Ireland

Contacts

Traineeship supervisor

Audrey Michael

Traineeship supervisor

Baranov Pavel

p.baranov@ucc.ie

Zoekopdracht

Klassiek

Via Map

BLT Stages

Traineeship / bachelor project

Pages

University College Cork, The LAboratory of Post-Transcriptional control and bioInformatics (LAPTI), Cork, Ireland

Address

Contacts

Traineeship / bachelor project

Search form

Pages

University College Cork, The LAboratory of Post-Transcriptional control and bioInformatics (LAPTI), Cork, Ireland

Address

Contacts