VIB – Bioinformatics Core (BITS)
- Linux machine
- Installatie nodige software: HMM, emboss, perl, R
- Installatie nodige databases – voorzien up-date oplossingen
- Configuratie voor SSH toegang / remote desktop toegang – gebruikersvriendelijkheid!
- Integratie met diensten die BITS aanbiedt
- Creëren van een database (mysql) met informatie over humane variatie (variatie bestanden circa 300 Mb/genoom)
- Maken van een interface (php) om de databse te doorzoeken om een oplossing te vinden voor vragen zoals: gegeven een bepaalde ziekte, geef alle variaties van chromosom 1 en vergelijk het met een referentiegenoom
- Paths om te onderzoeken: integratie www.biotorrents.net
- Beveilig bittorrent server: tunnelling bittorrent verkeer via SSH
- BOINC server
- Implementatie van een common interface
- Bepaalde premade algoritmen …
The goal of the project is to support the integrative efforts at VIB bioinformatics Core facility, with emphasis on integration of data from VIB’s research activities. In specific, state-of-the-art bioinformatics software are used to integrate proteomics, metabolomics, and transcriptomics data sets. Limitations in the functions of these software are identified and optimized in towards making them more user friendly for biologists who lack or have very basic computational skills, for enhancing biological discoveries.
Rapid technological advances have led to the massive production of different types of biological (big) data and enabled construction of complex networks with various types of interactions between diverse biological entities. Single omics analysis methods were shown to be limited in dealing with such heterogeneous networked data. Integrative methods can collectively mine multiple types of biological data and produce more holistic, systems-level biological insights .
Principal component analysis (PCA) combined with sparse Partial Least Squares (sPLS) implemented in the R package mixOmics , Weighted Gene Co-expression Network Analysis (WGCNA)  and Multi-Omics Factor Analysis (MOFA) , are used to summarize and decipher two data sets consisting of transcriptomics, proteomics and metabolomics experiments from Mus musculus and Medicago truncatula. These three robust, statistical approaches, were designed to uncover driver genes that are responsible for numerous cellular processes. While PCA is an appropriate and commonly used method, WGCNA holds several advantages in the analysis of highly multivariate, complex data by modelling them as networks/modules .
The WGCNA R software package incorporates functions for performing various aspects of weighted correlation network analysis such as network construction, module detection, gene selection and visualization. MOFA is a factor analysis model for the integration of multi-omic data sets. Once trained, the model output can be used for downstream analyses, including the visualisation of samples in factor space and enrichment analysis. The mixOmics R package proposes several multivariate methods that are suited to large omics data sets and that have the properties of reducing the dimension of the data by using components that are used to produce graphical outputs that shows the relationships and correlation structure between the different integrated omics.
Whereas mixOmics and MOFA R packages can be run easily, WGCNA contains a few functions in the original protocol that are designed to run on a single cluster or metabolite. In order to apply WGCNA on the above datasets, the open source code base of WGCNA has been extended by additional utility functions. These extensions allows to circumvent the limitations that are hard to produce for non-computational experts (i.e. biologists) and hence cause them losing a lot of time and eventually interest in applying such advanced methods. As a result, an iterative method of WGCNA was developed which allows users with little or no coding skills to run the code in a swift and user-friendly manner. This improved strategy greatly expands the general applicability of WCGNA and provides processes that runs in a loop for deriving relating modules to external clinical traits and identifying important links between these traits and genes.
The final step in this project is to provide a Jupyter Notebook with SoS kernel suitable for executing WGCNA, MOFA and mixOmics in an user-friendly interface [Fig.1]. Jupyter Notebook is developed to easily run scripts by non-experts and to allow more reproducible analysis. The provided notebook includes a function to merge significant genes among the three methods to identify common genes.
0032 9 244 66 34