Getting Started

XClone is a software tool designed for analyzing single-cell RNA sequencing data to identify somatic copy number alterations (CNAs) within individual cells. Somatic CNAs play a crucial role in various biological processes, particularly in cancer development.

XClone employs probabilistic modeling to estimate the copy number alterations status of each gene within each cell in a cancer sample. By analyzing gene expression levels and the B allele frequency of gene bins, XClone can determine which cells are likely to have copy number variations and which genes are affected.

In XClone tutorials, we’ll guide you through the steps of using XClone to analyze single-cell RNA sequencing data for CNA analysis. We’ll cover everything from preparing the input data to interpreting the analysis results. Whether you’re new to single-cell RNA sequencing or an experienced user, this tutorial will provide a comprehensive introduction to using XClone for CNA analysis.

Public Datasets

TNBC1 data: a triple-negative breast cancer (TNBC) sample that was assayed by droplet-based scRNA-seq (10x Genomics), from Gao et al. (2021).

BCH869 data: a glioma sample BCH869 with histone H3 lysine27-to-methionine mutations (H3K27M-glioma), where 489 malignant cells and 3 non-tumour cells were probed by smart-seq2, from Filbin et al. (2018).

Raw Data matrix (in anndata format) for RDR module and BAF moudle can be downloaded and imported directly (see notebook tutorials). Data download at demo_data.

Examples of XClone and steps for reproducible results are provided in Jupyter Notebook under examples folder.

For start, please refer to tutorials analyzing TNBC1 and BCH869 datasets, or TNBC1_tutorial and BCH869_tutorial records step by step.

XClone on new datasets

Preparing XClone input

For detailed preprocessing tool from bam files to anndata.AnnData format input for XClone, refer to preprocessing page, XClone preprocessing.

Import package

import xclone

XClone provides integrated functions (RDR, BAF and Combination) for CNV analysis by default whilst specific configurations might need to be adjusted accordingly. For each XClone module, we provide independent Class for configuration. You can specify which module to use by set module = “RDR”, module = “BAF” or module = “Combine” in xclone.XCloneConfig(dataset_name = dataset_name, module = “RDR”). We provide two sets of default base configurations for 10X Genomics scRNA-seq and SMART-seq, default is 10X scRNA-seq. If you want to get default base configurations for SMART-seq dataset, you can set set_smartseq = True in xclone.XCloneConfig().

If you want to change params setting, Please Sub-class and override base configuration file (here lists a few frequently params used), please refer config.py for detailed arguments. After overriding, you can print xconfig.display() to show the updated configurations before you run the module, e.g., RDR_Xdata = xclone.model.run_RDR(RDR_adata, config_file = xconfig).

For detailed description of the functions in XClone, refer to API page, :ref:`XClone API <api>`.

RDR module

xconfig = xclone.XCloneConfig(dataset_name = dataset_name, module = "RDR", set_smartseq = True)
xconfig.set_figure_params(xclone= True, fontsize = 18)
xconfig.outdir = out_dir
xconfig.cell_anno_key = "cell_type"
xconfig.ref_celltype = "N"
xconfig.top_n_marker = 25
xconfig.marker_group_anno_key = "cell_type"
xconfig.xclone_plot= True
xconfig.plot_cell_anno_key = "Clone_ID"
xconfig.trans_t = 1e-6
xconfig.start_prob = np.array([0.3, 0.4, 0.3])


xconfig.display()

RDR_Xdata = xclone.model.run_RDR(RDR_adata,
            config_file = xconfig)

BAF module

xconfig = xclone.XCloneConfig(dataset_name = dataset_name, module = "BAF", set_smartseq = True)
xconfig.set_figure_params(xclone= True, fontsize = 18)
xconfig.outdir = out_dir
xconfig.cell_anno_key = "cell_type"
xconfig.ref_celltype = "N"
xconfig.concentration = 35.5
xconfig.concentration_lower = 20
xconfig.concentration_upper = 100
xconfig.theo_neutral_BAF = 0.5

xconfig.xclone_plot= True
xconfig.plot_cell_anno_key = "Clone_ID"
xconfig.phasing_region_key = "chr"
xconfig.phasing_len = 100

xconfig.WMA_window_size = 6

xconfig.trans_t = 1e-6
xconfig.start_prob = np.array([0.2, 0.15,  0.3, 0.15, 0.2])

t = xconfig.trans_t
xconfig.trans_prob = np.array([[1-4*t, t, t, t,t],[t, 1-4*t, t, t,t],[t, t, 1-4*t, t,t], [t, t, t, 1-4*t, t], [t, t, t, t, 1-4*t]])
xconfig.CNV_N_components = 5

xconfig.BAF_denoise = True
xconfig.display()

BAF_merge_Xdata = xclone.model.run_BAF(BAF_adata,
            config_file = xconfig)

Combine module

xconfig = xclone.XCloneConfig(dataset_name = dataset_name, module = "Combine")
xconfig.set_figure_params(xclone= True, fontsize = 18)
xconfig.outdir = out_dir

xconfig.cell_anno_key = "cell_type"
xconfig.ref_celltype = "N"


xconfig.copygain_correct= False

xconfig.xclone_plot= True
xconfig.plot_cell_anno_key = "Clone_ID"
xconfig.merge_loss = False
xconfig.merge_loh = True

xconfig.BAF_denoise = True
xconfig.display()

combine_Xdata = xclone.model.run_combine(RDR_Xdata,
                BAF_merge_Xdata,
                verbose = True,
                run_verbose = True,
                config_file = xconfig)