
Abstract
CytoVI: Deep generative modeling of cytometry data across technologies
Flow cytometry was historically the first single cell technology to measure millions of cellular states within minutes. Due to its robustness and scalability flow cytometry and related antibody-based single cell technologies have become an irreplaceable part of routine clinics and evolved to a powerful tool for exploratory research. Opposed to the intrinsically noisy and sparse data characteristics of most genomic single cell technologies, antibody-based cytometry technologies offer high-resolution measurements of millions of cells across a wide dynamic range facilitating the analysis of large patient cohorts. However, the analysis of multi-cohort studies is often obstructed by batch effects and differences in antibody panels or technology platforms utilized to analyze samples. Here, we present CytoVI, a deep generative model designed for the integration across antibody-based technologies. CytoVI removes technical variation in flow cytometry, CyTOF or CITE-seq data and embeds cells into a meaningful low-dimensional representation corresponding to a cells intrinsic state. CytoVI performs favourable compared to existing tools in data integration tasks, imputes missing markers in experiments with overlapping antibody panels and predicts a cells transcriptome if paired with CITE-seq data. We utilized CytoVI to generate an integrated B cell maturation atlas across 350 proteins from conventional mass cytometry data and automatically detect T cell states associated with disease in a large cohort of Non-hodgkin B cell lymphoma patient measured by flow cytometry. Beyond its applicability for preclinical research, we showcased that CytoVI can automatically identify tumor cells in chronic lymphatic leukemia patients via transfer learning and predict a patient’s diagnosis in a fully automated fashion. Therefore, CytoVI represents a powerful deep learning tool for preclinical research and enables an accurate automated analysis of immunophenotypes in patient samples in clinical settings.