Search form

University of Pretoria, Bioinformatics and Computational Biology Unit

Contact details
Traineeship proposition
Abstract
Testimony
Admin

Abstract 2018-2019: Analysis of germ line variants with specific effect prediction of noncoding variants in cancer-related genes of black female South African breast cancer patients

The most described and important cancer in woman is breast cancer, occupying a staggering second place of cancers with high incidences worldwide. The disease can be found more in women than in men, covering a 100 to 1 ratio (1,2). Historically breast cancer was always rated the second most common cancer in North America in comparison with lung cancer that has been rated as number one (3,4,5). According to Society AC. et al., breast cancer outnumbers lung cancer to be the most prominent cancer (6). Meanwhile in South-Africa studies reported that the lifetime breast cancer risk for women is 1 in 28, with 0,7% of all deaths caused by breast cancer. A total of 166 blood samples were previously collected from black South African females with breast carcinoma. The patients visited the "Oncology Clinic" at Steve Biko Hospital. Consent for all samples were given by patients following ethics approval for the study. DNA was obtained from the peripheral blood samples by the procedure illustrated by Johns and Paulus-Thomas et al. (1989). All samples were tested for the presence of BCRA mutations, however all the samples tested negative. Samples were subsequently analysed for germ line variants in selected cancer-related genes of black female South African breast cancer patients. The first step after performing the quality analysis was trimming the samples. The FastX_toolkit was used to trim to 5 and 95 nucleotides on the 5' and 3' ends of the 100bp paired-end reads respectively. Next, the samples were mapped against the reference hg19 genome. For this step BWA-MEM was used. Samtools was used next, to view, sort and index the aligned reads. Then, Qualimap was used to calculate how many reads aligned to each gene of interest. In the following step duplicate reads were marked using Picard Mark Duplicates. The GATK Toolkit was used for the base quality score recalibration and outputs a recalibrated BAM or CRAM file. After recalibrating and applying the base quality scores, the next step was to process to variant calling. To complete this step the GATK HaplotypeCaller was used in gVCF mode. The actual variant calling consisted of more than one step that can be found in the electronic notebook. After variant calling using the HaplotypeCaller, variants were filtered using a specified cut-off value. This step consisted of two tools, first the type of variants was selected with the GATK tool SelectVariants, where the options were between SNP's or indels. After selecting the variants, the GATK VariantFiltration tool was used to filter variant calls based on INFO and/or FORMAT annotations. Also, before selection variants with occurrence of ≥ 1% in the ExAcAfr (the polymorphisms) were removed. In the last step, the Ensembl Variant Effect Predictor (VEP) was used. VEP is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provided access to an extensive collection of genomic annotation. For this study the result of 2 functional effect were considered being CADD or GWAVA with variants being selected if both methods predicted a variant to be deleterious. Moreover, VEP provided result for multiple transcripts per gene as determined by mapping of REFSEQ identifiers to Ensembl transcripts via UCSC.

Address

Lunnon Road
Hillcrest, 0001 Pretoria
South Africa

Contacts

Traineeship supervisor
Fourie Joubert
fourie.joubert@up.ac.za
Zoekopdracht
Klassiek
Via Map