Search form


Contact details
Traineeship proposition

Stage-onderwerp banaba bio-informatica 2015-2016: Semantische disambiguatie van farma- en biotechbedrijven in klinische studies en patenten

Samenvatting eindwerk 1 2014-2015: Integratie en semantische conversie van autisme gerelateerde gendata in een linked data omgeving
The work described in this thesis has been performed in collaboration with ONTOFORCE. The aim was to convert free accessible data from AutDB and Sfari Gene, two databases containing curated gene information related to autism, into a semantic web format.
The source data was only accessible in html-format, which implied that scraping techniques had to be developed to capture the data from the databases. The data was first converted in a csv-format and subsequently in rdf-format that is one of the standard formats for linked open data. The intermediate step to .csv was done to be able to generate a multipurpose csv-to-rdf program. The rdf-converted data, containing lots of data represented in the typical triple format, will be ready to integrate in DISQOVER. This is a knowledge search system for life science and healthcare data developed by ONTOFORCE and is based on the semantic web and linked open data principles. By integrating the AutDB and Sfari Gene data into DISQOVER, people active in autism research will be able to get a more integrated overview of the genetic information in that field.
To obtain this goal, the scraping and conversion process was performed with Python scripting. This programming language and the SPARQL-language, a SQL-like language to query semantic web data, were learned during the internship period by following tutorials in order to obtain a high scripting level. Also the semantic web principles had been taught, to get a good insights in the semantic web. The project is successfully ended at the stage that data will be ready in a short time for uploading into DISQOVER. This will further enrich the already present data and generate new links between data that can be crucial in further research projects.
By this thesis, hopefully new “links for lives” will be found, that can be used to search smarter for new insights to boost autism research. (Ontoforce,2015)
Samenvatting eindwerk 2 2014-2015: Integration and semantic conversion of oncological mutation data in a linked data environment
Semantic Web programming is a whole new technology in the world of informatics. There’s a lot of big data available over the WWW, this data is stored in databases. The Semantic Web offers a powerful and practical approach to gain mastery over the multitude of information and information services. Semantics offer the leverage to make more information better and not overwhelmingly worse.
There are a few requirements to create a Semantic Web. New data representations are needed and some knowledge of computer science and insight in the data are required. The data in a Semantic Web expresses the meaning of the data and gives a whole new dimension by adding extra information to it.
This project is about the integration and the semantic conversion of somatic mutations in cancer, the data is captured from the Catalogue Of Somatic Mutations in Cancer (COSMIC) database. The purpose of this project is to convert this somatic mutation data in an environment such that searches are easier and more efficient. A Semantic Web is very interesting in cancer research. Every day scientists discover new technologies, methods, medicines to heal tumors. By storing this data and creating a Semantic Web, new solutions can be discovered and new technologies and methods will be developed.
A Semantic Web is developed by creating an ontology of the COSMIC classification. On the basis of this ontology the full COSMIC data can be converted in a linked data environment.


Ottergemsesteenweg Zuid 808
9000 Gent


Traineeship supervisor
Filip Pattyn
Via Map