EMBL-EBI (Cambridge, UK)
Abstract 2018-2019: Enhancement of pseudochromosome building based on synteny and karyogram staining pattern comparison
Recent advances in genome sequencing technologies have enabled the generation of complete or partial genome assemblies for a wide variety of species worldwide. Many of these genomes, unlike the assemblies of traditional reference organisms such as Mus musculus and Homo Sapiens, are limited to the contig/scaffold level. The lack of a chromosome level assembly can severely limit the possibility of carrying out a comparative genome analysis between species, e.g. a gene order comparison or a phylogenetic analysis. Because of this limitation, projects such as the Vertebrate Genomes Project (VGP), whose goal it is to create new reference genomes, aim to obtain assemblies on a chromosomal level. At EBI, as part of the VGP, a pipeline has been constructed to build these new reference genomes. While it is nowadays possible to assemble genomes on a chromosomal level, it is important to note that there are only a few computational tools available for this task (e.g. Ragout). Although Ragout has been specifically designed for ordering scaffolds, it is incapable of producing nearly error-free results that are necessary for creating pseudochromosomes. Furthermore, this tool cannot use faulty scaffolds that have been split. To address these limitations, we have developed a software package called SynChroBuild. In the last two months, the code for most functions of the SynChroBuild scripts has been written. The long term goal of this project is to support other types of data input, such as BioNano optical maps, in order to increase the accuracy of the scaffold alignment. Currently, the SynChroBuild package performs a series of necessary steps throughout the entire pseudochromsome building pipeline. It functions as a bridge between the karyogram map and the BLASTN result. The pipeline itself consists of two major sections: one section is the creation of a comparative synteny map, while the other is the construction of the pseudochromosome genome.
The SynChroBuild package contains two scripts, one for each pipeline section. The first script uses a BLASTN result to create SVG files containing scaffolds that are aligned to their matching reference chromosome(s). By using these files in combination with a corresponding karyogram, the comparative synteny map can be constructed. The second script builds the FASTA file of the pseudochromosomes by using the synteny file as input. This synteny file is derived from the comparative synteny map created in the first section of the pipeline. In addition to its primary functions, SynChroBuild also has several support functions. It is capable of producing AGP files (these files give info about scaffold order and orientation), a BLASTN hit summary file and a synteny summary file. It can also convert SVG files into PDF format. So far the pseudochromosome building of three species has been completed using this new code. While most steps of the pipeline have been made easier with the introduction of SynChroBuild, there is still room for improvement. The code responsible for implementing BioNano optical map data in the scripts still has to be finished. Once completed, this will result in a substantial increase of the accuracy of scaffold ordering. Currently the longest, most labour-intensive step of the pipeline is the construction of the comparative synteny map. The eventual end goal of the SynChroBuild project is to develop a tool capable of integrating inputs from multiple sequencing techniques and from karyotype data in order to construct the pseudochromosome genome by itself.
Tags: bioinformatics |
Address
Saffron Walden CB10
1SD Hinxton
United Kingdom |
Contacts
Traineeship supervisor
Thomas Keane
+ 44 (0) 1223 49 4349 tk2@ebi.ac.uk |
Traineeship supervisor
Jingtao Lilue
jl17@ebi.ac.uk |