University College Cork, The LAboratory of Post-Transcriptional control and bioInformatics (LAPTI), Cork, Ireland
Abstract internship (advanced bachelor of bioinformatics) 2016-2017: Upgrade of the GWIPS-viz ribo-seq genome browser
The goal of my internship is to upgrade the GWIPS-viz Ribo-seq genome browser.
GWIPS-viz (Genome Wide Information on Protein Synthesis) is a freely available online genome browser for analysing ribosome profiling (Ribo-seq) data. The data is generated by deep sequencing ribosome protected mRNA fragments. This makes it possible to quantify the mRNA transcripts that are actively being translated in the cell.
GWIPS-viz (http://gwips.ucc.ie) is a customized version of the UCSC (University of California Santa Cruz) Genome Browser specifically tailored for Ribo-seq data.
There are two major aspects of this internship: First the integration of the latest functionalities from the UCSC Genome Browser and second to guarantee that future upgrades will be more seamless and require less time investment.
Since the last GWIPS-viz upgrade in September 2015, the UCSC Genome Browser development team have added new functionalities like the “multi-region view” which would be particularly beneficial for exploring the Ribo-seq data hosted on GWIPS-viz. This functionality makes it possible to automatically exclude the display of introns so that only exonic regions are displayed in the genome browser window. This results in a more convenient display for Ribo-seq data, because Ribo-seq data is not expected to be found in intronic regions.
For the second aspect of this internship, I will be focussing on how to guarantee that future updates can be executed in a faster way than in the previous releases. There are two ways to look at an upgrade of a mirrored site. One way is to modify the existing GWIPS-viz browser to provide the same functionalities like the UCSC Genome Browser. The second way is to modify the UCSC Genome Browser C source code to look more like the GWIPS-viz browser. The first approach is more likely to insure that future updates are quicker.
The upgrade process is not a one-step process. There will be a lot of testing required, analysing of failures and implementation of modifications to accomplish the upgrade to the latest version of the GWIPS-viz browser.
The ribosome profiling (RiboSeq) technique was developed in 2009 to assess gene expression at the translation level at the scale of the entire cell transcriptome . The ability of the technique to produce unprecedentedly detailed quantitative characterization of gene expression on a global scale has made it popular. The applications of the technique prompted researchers to reconsider the current views on how protein coding information is organized in the genomes and on how protein synthesis is carried out and regulated in the cells [2-4]. However, ribosome profiling data are highly heterogeneous and difficult to analyze due to the presence of sporadic technical noise. A normalization technique (RUST) was recently developed at LAPTI lab, University College of Cork. RUST allows for circumvention of these problems and can be applied to RiboSeq data obtained in eukaryotic organisms . RUST can be used for assessing the quality of datasets and estimating how properties of mRNA affect the speed of elongating ribosomes. RUST profiles for many of the eukaryoytic RiboSeq datasets are available on GWIPS-Viz (https://gwips.ucc.ie). This is a UCSC genome browser based tool for the analysis ad visualization of RiboSeq data obtained with ribosome profiling technique.
Because of intrinsic differences between bacterial and eukaryotic translation, in particular because of widespread mRNA-rRNA interactions , direct application of RUST to bacterial RiboSeq data is not practical. Therefore, the first goal of the project was to adapt RUST for bacterial data. RUST profiles using a static 3’ offset of 12 nucleotides for all read lengths were generated for 25 bacterial datasets (E.coli, etc) and compared to RUST profiles where a variable three prime offset for each individual read length were generated for the same datasets. No large improvements by the use of a variable offset was observed.
A broader distribution of reads, particularly a higher number of short reads (<25nt) was observed for bacterial RiboSeq datasets compared to eukaryotic RiboSeq datasets. Therefore RUST was tested on a dataset where the reads had a minimal read length of 15nt and compared to the in-house default setting of minimal read length of 25nt when mapped to the reference genome. The number of mapped reads were larger but the RUST profiles did not have any distinct changes. The decision was made to use a static 3’ offset of 12nt and keep the default minimal read length of 25nt. Hence remapping all the bacterial RiboSeq datasets for 25studies was considered unnecessary.
After completion of the exploratory analysis, RUST was applied to all the bacterial datasets (25) in GWIPS-viz . The users can explore the profiles in a similar way to what was already available for eukaryotic datasets.
Figure 1 shows an example of a RUST profile. It consists of 3 different plots, the metagenefootprint profile with a Kullback-Leibler divergence (top), a plot with relative coding rates (bottom left) and a panel that shows the triplet periodicity (bottom right).
In the top panel, each gray line represents a codon and it's RUST ratio, this is a log scale of the normalized observed to expected RUST ratio. The 0 coordinate is the A site of the ribosome and the gray area shows the footprint of the ribosome. The Kullback-Leibler divergence at a particular position indicates the variation in Ribo-seq footprint occupancy across all sense codons. The higher the Kullback-Leibler divergence, the less uniform the distribution of RUST values is in the corresponding position. Two codons can influence each other, therefore the adjacent Kullback-Leibler divergence is shown as well. Ideally, the highest divergence should be at the A-site representing that it is the de-coding center of the ribosome that contributes to the highest divergence in footprint occupancy.
In the bottom left panel, the relative RUST ratio for each amino acid is presented. This plot was optimized during the project. It is now possible to distinguish the different synonymous codons for each amino acid.
The panel in the bottom right was also modified to include reads from 36 to 40 nucleotides.
Western Gateway Building, School of Biochemistry and Cell Biology