Search form

University of Pretoria, bioinformatics and Computational Biology Unit

Contact details
Traineeship proposition
Stageonderwerp 2010-2011:
The trainee will acquire basic and some advanced skills in bioinformatics, including applied analysis skills and programming skills. The skills will involve tasks related to the in silico analysis of genome and proteome data from a variety of organisms, including bacteria, parasites, insects and plants.
Samenvatting eindwerk 2013-2014: A comparison of somatic mutation callers in breast cancer samples and matched blood samples
Identifying somatic single nucleotide variations (SNVs), insertions, deletions and single nucleotide polymorphisms (SNPs) is indispensable in the understanding of tumors and cancers. Especially for therapeutic anti-cancer treatments.
Here, six tools will be compared on their abilities to detect somatic SNVs in two samples. Sample 112 and 181 both consist of DNA from breast cancer tissue and matched normal blood DNA from the same person. Both samples are from two patients with a hereditary line for breast cancer.
Both samples were preprocessed before running on the mutation callers. Preprocessing includes filtering on quality of nucleotide, trimming on length of read, mapping to the reference human genome version Hg 19 and other steps peculiar to the GATK toolkit.
The six mutation callers are MuTect, VarScan, EBCall, JointSNVMix2, SomaticSniper and Strelka. They were all run with standard settings on the two samples resulting in a highly noticeable difference in the amount of somatic SNVs detected. JointSNVMix detected a strangely bigger amount then all other tools on both samples. VarScan, Strelka and JointSNVMix detected the most SNVs in common and were selected for further analysis. MuTect, EBCall and SomaticSniper were removed from further analysis for several reasons (not being able to change parameters, a higher amount of independent somatic SNVs called..).
VarScan, Strelka and JointSNVMix were executed again with optimized parameters. The minimum depth of coverage was set to a minimum of 10 and the threshold and prior probability for all three mutation callers were adjusted. For the 112 sample, only 19 (2%) of the total of 990 somatic SNVs detected were in common. For sample 181, the amount of commonly detected somatic SNVs were 3% (100 out of a total of 3906).
The reason why there are so few commonly detected somatic SNVs between the six mutation callers is unclear. The specificity and accuracy of these tools still need to improve a lot. There is also a lack of thoroughly information about all the options and settings of these tools.


FABI Square 3-3, Lunnon Road
0002 Pretoria
South Africa


Traineeship supervisor
Prof. Fourie Joubert
+27 124205825
Via Map