Ultan Doherty

Ultan Doherty

School of Computer Science and Statistics, Trinity College Dublin, the University of Dublin

Dublin, Ireland

gateTree: A user-informed tree algorithm for population identification in flow cytometry

We present gateTree, a semi-supervised decision tree algorithm for automated gating of flow cytometry data which utilises user-provided information to implement population-specific variable selection, outlier removal, and pruning. To apply the gateTree algorithm, the user must specify whether cell populations of interest have positive or negative expression levels for a selection of channels. The algorithm recursively partitions the events based on whether their values for one of the selected channels are higher or lower than the threshold constructed by the algorithm to optimally split that channel. In particular, the sequence in which the channels are split is designed to optimally identify the described populations. That is, the objective of the algorithm is to construct subsets of events with the same characteristi cs as the described populations. We applied gateTree to a flow cytometry data sample collected for a study measuring T cell proliferation in patients undergoing haemodialysis. Our analysis used pre-gated single-cell observations (n = 32,624) whose expression levels were measured across nine fluorescence channels. gateTree was able to replicate four manual gating pathways more accurately with respect to F1 than two state-of-the-art gating algorithms, flowSOM (Van Gassen et al., 2015) and cytometree (Commenges et al., 2018).