Search form


Contact details
Traineeship proposition

Abstract advanced bachelor of bioinformatics 2019-2020: Automated detection of cell populations in B-ALL using flowSOM

B-cell acute lymphoblastic leukemia (B-ALL) is a type of blood cancer characterized by a maturation arrest in the B-lineage of the lymphocytes. This results in a large number of
immature lymphocytes. The different stages of maturation of B cells in the human bone marrow are detected by immunophenotypic analysis through flow cytometry. The
advantage of flow cytometry is its ability to detect a small population of residual leukemic cells. This detection of minimal residual disease (MRD) is an important prognostic factor
for B-ALL. Due to the increasing amount of markers used in a flow cytometry, the complexity of the analysis increases. As a consequence, this analysis based on bi-parametric plots becomes more time-consuming and dependent on the skills and knowledge of the researcher. Alternatively, automated clustering algorithms offer an unsupervised and objective method to identify cell populations in a multi-dimensional way.
The goal of this traineeship was to build a pipeline to automatically detect cell populations in B-ALL. After careful considerations, flowSOM was chosen as clustering
algorithm because of its speed, overall performance and visualization possibilities. FlowSOM is a clustering tool based on artificial neural networks. It creates a selforganizing
map (SOM) in which every event of the data is assigned to a node. These nodes are connected to each other to create a grid where nodes that are closely connected have a higher resemblance than nodes connected through a long path. The SOM can be visualized as a minimal spanning tree (MST) and meta-clustering can be performed to get a final clustering, making it eligible for a rapid and easy exploratory analysis.
I used the publication “An R-Derived FlowSOM Process to Analyze Unsupervised Clustering of Normal and Malignant Human Bone Marrow Classical Flow Cytometry Data”
(Lacombe et al., 2019) as a guideline to set up the pipeline. The data was preprocessed with the Kaluza analysis software to compensate spectral overlap and to apply manual gating on the cell type of interest. This software package was also used to export the compensated data to a comma-separated values (CSV) file. This file served as input for the pipeline. The pipeline itself is compiled in the R programming language and consists of scripts to read, write and process the data.
The pipeline can be split into preprocessing steps and data analysis. The preprocessing steps are represented by multiple scripts. These scripts first convert the CSV file into a
flow cytometry standard (FCS) file, a normalization step can be applied and datasets are tagged & merged. Most of these tasks have been performed with the flowCore package.
The second part of the pipeline is the actual data analysis. This part consists of scripts to produce a ‘frozen’ MST representation of the reference normal bone marrow (NBM)
sample, to analyse new data within the frozen representation, and to perform a flowSOM analysis with a free representation of the MST. The flowSOM package was used for the
unsupervised analysis, while flowCore was used to read and write the data files.
The results of the pipeline were evaluated by a post-analysis in Kaluza. By extraction of the X- and Y-coordinates of the MST from the flowSOM object and adding this as a new parameter to the FCS files, we were able to reproduce the MST within Kaluza. To add more value to the MST, we included the meta-clustering consensus as an extra parameter. This facilitated the identification of the different cell populations. Node-by-node examination can still be done where necessary.
In conclusion, the pipeline provides a rapid, easy and objective identification of the different cell populations which was the main goal of the traineeship. Careful examination of the results is still advisable. To assess the robustness of the pipeline and the sensitivity to detect the MRD in B-ALL, a profound statistical analysis should be performed.

Van Gassen S, Callebaut B, Van Helden MJ, et al. FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A. 2015;87(7):636‐645.
Lacombe F, Lechevalier N, Vial JP, Béné MC. An R-Derived FlowSOM Process to Analyze Unsupervised Clustering of Normal and Malignant Human Bone Marrow Classical Flow Cytometry Data. Cytometry A. 2019;95(11):1191‐1197. doi:10.1002/cyto.a.23897


Wolstraat 105
1000 Brussel


Traineeship supervisor
Brigitte Cantinieaux
02 435 2070
Traineeship supervisor
Via Map