M20 Genomics

VITA CytBase: Elevating Precision of Cell-Type Annotation in Single-Cell Transcriptome Analysis

发布时间:2024-03  /  浏览次数:106 次

In the rapidly evolving landscape of single-cell transcriptome analysis, precision and accuracy are paramount. Researchers worldwide are increasingly turning to innovative technologies to dissect the complex cellular landscape with unprecedented resolution. Among these advancements, M20 Genomics distinguishes itself with its pioneering VITA Single-Cell Full-Length Transcriptome Sequencing Platform.

In August 2022, M20 Genomics unveiled "M20 Seq®" along with the VITA Single-Cell Full-Length Transcriptome Sequencing Platform, redefining the boundaries of single-cell transcriptomics. The VITA platform offers unparalleled features, including comprehensive compatibility across various species and sample types, coupled with full transcriptome capture, thereby greatly expanding the scope of single-cell technology and enabling researchers to explore uncharted territories of cellular heterogeneity.

Diverging from poly(A)-based single-cell transcriptome technologies, the VITA platform operates on the principle of random primers. While this unique approach enhances single-cell transcriptome analysis, it poses potential challenges in cell type annotation. The divergence from traditional poly(A)-based single-cell transcriptome necessitates optimized annotation tools tailored to the distinct characteristics of random primer-based single-cell transcriptome data. 


Navigating Complexity: Equipped with VITA CytBase and VITA Biscuit

To address this challenge, we have developed VITA CytBase, a database meticulously designed for precise annotation of cell types using data derived from the VITA platform. VITA CytBase offers a comprehensive suite of annotation capabilities specially designed to accommodate the unique features of VITA platform's data, providing researchers with a reliable and accurate resource for random primer-based single-cell transcriptome analysis..

In conjunction with VITA CytBase, we have released VITA Biscuit, a complementary tool designed to enhance the functionality of VITA CytBase. Equipped with advanced analysis features, VITA Biscuit empowers researchers to gain meaningful insights from their single-cell transcriptome data with exceptional precision and efficiency.

With VITA CytBase and VITA Biscuit, researchers can confidently navigate the complexities of single-cell transcriptome analysis. This proficiency fosters groundbreaking discoveries in the intricate mechanisms underlying cellular functions.


Key Features and Benefits of VITA CytBase

Comprehensive Coverage:

VITA CytBase encompasses over 50 organ types across 13 distinct body systems in both human and mouse (Figure 1, human organ systems provided as an example), ensuring broad applicability across diverse research domains.

Figure 1: Overview of Human Systems and Organs Covered in VITA CytBase.

Continuous Updates:

Steady data integration and ongoing updates ensure the relevance and comprehensiveness of VITA CytBase, keeping pace with the latest advancements in single-cell transcriptomics.

High-Confidence Annotations:

Leveraging advanced bioinformatics and single-cell mRNA as well as lncRNA data sourced with the VITA platform, VITA CytBase offers researchers curated annotations for all tissues and organs, empowering precise insights into cellular heterogeneity.

Seamless Analysis:

Integrated within VITA CytBase, VITA Biscuit streamlines the process of automated cell type annotation, enhancing workflow efficiency and productivity.


Evolving Continuously: Developing VITA CytBase

Phase 1: Data Generation with the VITA Platform

To lay the foundation of VITA CytBase, we leveraged the advanced capabilities of the VITA platform to compile a comprehensive dataset from 49 diverse tumor samples (Figure 2), totaling over 800,000 cell nuclei. This substantial dataset serves as the cornerstone for creating a comprehensive pan-cancer single-cell tumor microenvironment atlas [1-2].

Figure 2: The foundational Datasets Used in Phase 1 of VITA CytBase Development.

Following the successful release of VITA CytBase Phase 1, the development of VITA CytBase is advancing to subsequent phases, aiming to enhance users' ability to achieve comprehensive and precise cell type annotations.


Phase 2: Integration of Single-Cell Big Data

Leveraging the VITA platform, Phase 2 focuses on systematically assembling a substantial data repository derived from over 10,000 human and mouse samples,which will be seamlessly integrated into the infrastructure of VITA CytBase.


Phase 3: Identification of Cell Type Markers for Each Tissue and Organ

Phase 3 involves the identification of marker genes for cell type annotations in individual tissues and organs through three key steps (Figure 3):

        Step 1: Cell typeclassification.

        Step 2: Non-negative matrix factorization (NMF) employing machine learning to uncover potential marker gene

        Step 3: Refinement of markers through pathway enrichment analysis (e.g. GSEA).

This iterative process ensures thorough analysis and validation, resulting in a repository of specific and high-confidence cell type markers for each tissue and organ.


Figure 3: Iterative Approach for Identifying Tissue and Organ-Specific Cell Type Markers in Phase 3.

Phase 4: Integrating Cell Type-Specific Marker Gene Sets from Various Tissues and Organs

Phase 4 entails integrating the diverse marker gene sets specific for different cell types obtained from various tissue and organ datasets using the VITA platform. Leveraging our extensive dataset and iterative methodology, this integration process aims to establish both pan-tissue/organ and tissue/organ-specific cell type marker gene sets within VITA CytBase.


Phase 5: Systematic Classification

Phase 5 employs hierarchical classification to systematically organize tissues, organs and body systems, curating cell type-specific marker gene sets for various systems and organs.

To complement cell annotation with VITA CytBase, we have developed the annotation tool VITA Biscuit. This tool integrates annotation methods based on SingleR[3] and diverse databases including CellMarker[4], CellTaxonomy[5], and VITA CytBase. It facilitates automated cell type annotation, streamling the analysis for researchers.


Validation of VITA CytBase: Insights from the Inital Dataset

We are excited to present the first dataset obtained during Phase 1 of constructing VITA CytBase[1], which successfully identified 2,976 markers across 30 distinct cell types in pan-cancer tissue samples (Figure 4). This extensive marker set comprises 2,083 mRNAs, along with 893 lncRNAs, with marker counts per cell type ranging from 55 to 403, demonstrating the diversity and depth of the dataset.


Figure 4: Current Composition of Cell Type-Specific Markers in VITA CytBase (blue: mRNA marker; red: lncRNA marker).

Our data highlight the capabilities of VITA CytBase in facilitating cell type annotations, encompassing both mRNA and lncRNA markers (Figure 4). Comparing the top 3 mRNA markers (Figure 5A) and lncRNA markers (Figure 5B) within each cell type furthermore reveals their exceptional specificity.

Figure 5: Top 3 mRNA (A) and lncRNA (B) Markers by Cell Types.

To validate the mRNA markers identified with VITA CytBase, we conducted a comparative analysis with well-established public databases, including CellMarker [4], HCL[6], PanglaoDB[7], and others. This analysis yielded a verification rate of 93.8% (1,954 markers) within these public databases, confirming the accuracy of VITA CytBase. Moreover, leveraging our extensive VITA dataset, we identified 129 novel mRNA markers (Figure 6), further demonstrating the capacity of marker identification within VITA CytBase.


Figure 6: Cross-Validation of Cell Type-Specific Markers Identified by VITA CytBase in Public Databases.

We further analyzed the expression patterns of selected mRNA markers (Figure 7A) and lncRNA markers (Figure 7B) across all cell types using Uniform Manifold Approximation and Projection (UMAP) clustering. Notably, some of the newly identified mRNA markers (Figure 7A) have already been reported as potential marker genes for cell type annotation in previous studies[8-9]. This further validates the exceptional reliability and specificity of markers identified with VITA CytBase.


Figure 7: UMAP Clustering Depicting the Expression Patterns of Selected mRNA Markers (A) and lncRNA Markers (B).


Enhanced Cell Type Annotations: Comparative Evaluation of VITA Biscuit and Other Methods

To evaluate the annotation effectiveness of VITA CytBase on VITA single-cell transcriptome data, we conducted UMAP clustering and cell type annotation on single-cell transcriptome data obtained from a lung cancer sample. Employing three distinct annotation methods - SingleR automatic annotation[3], manual annotation, and VITA Biscuit - enabled a direct comparison of their performances.

The comparative evaluation revealed consistent annotation across all three methods. However, VITA Biscuit demonstrated superior capabilities in the identification of cell subtypes. Notably, it successfully subdivided epithelial cells into type I alveolar epithelial cells (AT1) and type II alveolar epithelial cells (AT2), while also accurately annotating an additional small population of macrophages (Figure 8).


Figure 8: Comparison of Cell Type Annotation Methods for VITA Single-Cell Transcriptome Data from a Lung Cancer Sample.

The enhanced precision of VITA Biscuit for identifying cell subtypes underscores its effectiveness in capturing nuanced distinctions within cell populations. This capability facilitates in-depth single-cell analysis, offering deeper insights into cellular heterogeneity.


Elevating Precision: Unveiling the Potential with VITA CytBase

As the field of single-cell transcriptomics continues to evolve, VITA CytBase remains steadfast in its commitment to precision and innovation. With ongoing advancements and collaborations, M20 Genomics aims to further enhance the capabilities of VITA CytBase, unlocking new frontiers in cellular biology and biomedical research.

Cultivating collaboration with researchers globally, we encourage exploration through VITA CytBase and the VITA platform. Together, we uncover the mysteries of cellular heterogeneity and pave the way for new scientific breakthroughs.



[1] Fan TQ, et al. Landscape and functional repertoires of long noncoding RNAs in the pan-cancer tumor microenvironment using single-nucleus total RNA sequencing. bioRxiv. 2023:569806.

[2] Chen H, et al. Pan-Cancer Single-Nucleus Total RNA Sequencing Using snHH-Seq. Advanced science (Weinheim, Baden-Württemberg, Germany). 2024;11(5):e2304755.

[3] Aran D, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nature immunology. 2019;20(2):163–172.

[4] Hu C, et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Research. 2023;51(D1):D870–D876.

[5] Jiang S, et al. Cell Taxonomy: a curated repository of cell types with multifaceted characterization. Nucleic acids research. 2023;51(D1):D853–D860.

[6] Han X, et al. Construction of a human cell landscape at single-cell level. Nature. 2020;581(7808):303–309.

[7] Franzén O, et al. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database. 2019;2019:baz046.

[8] Stengel S, et al. Peritoneal Level of CD206 Associates With Mortality and an Inflammatory Macrophage Phenotype in Patients With Decompensated Cirrhosis and Spontaneous Bacterial Peritonitis. Gastroenterology. 2020; 158(6): 1745-1761.

[9] Xiong L, et al. Single-cell RNA sequencing reveals B cell-related molecular biomarkers for Alzheimer's disease. Exp Mol Med. 2021;53(12):1888-1901.

    Create account

    • Working direction
    Create account