When considering a single cell spatial imaging platform, independent researcher-led studies that compare assay performance can be helpful. Here, we highlight three recent studies (Cook et al., Wang et al., and Cervilla et al.) that compare 10x Genomics’ Xenium In Situ with the NanoString CosMx Spatial Molecular Imager and/or Vizgen MERSCOPE platforms.
These studies, conducted by separate independent research groups, are deep dives on the different single cell spatial imaging platforms with a focus on:
See a summary of key findings from their studies below, or jump down for a more in-depth overview of what the authors did and what they found!
In these publications, the authors demonstrated that:
Xenium was better able to spatially resolve and discriminate cell types than competitors
Xenium showed greater specificity
Xenium exhibited greater consistency with orthogonal benchmarking
Xenium detected more transcripts on shared genes
Why does it matter?
Single cell spatial imaging technologies require both reliable cell segmentation and transcript detection to ensure you’re capturing single cells and assigning correct transcripts to them.
How did the authors use it in these studies?
Taking advantage of its cellular heterogeneity, Cook et al. focused on the stroma and examined epithelial, fibroblast, muscle, and macrophage markers (among others) in segmented cells. They characterized spatial organization based on characterized cell types in tissue. Finally, they attempted to detect a rare cell type (CD8 T cells, defined by coexpression of CD3, CD8A, and GZMA).
Wang et al. first examined the expression of pairs of non-overlapping marker genes, rationalizing that well-segmented cells should have a low proportion of cells expressing both genes in a pair. They also looked at the cell types each platform was able to resolve in several tissue types based on canonical markers.
Cervilla et al. characterized the relative abundance of cell types in each platform, as well as whether the cell types characterized in each technology exhibited patterns of spatial clustering. They also examined the accuracy of cell-typing composition as a function of total tissue area analyzed.
What did they find?
Both studies highlighted that the nuclear boundary expansion used by Xenium for cell segmentation* resulted in cells that were consistently larger than CosMx and MERSCOPE. In spite of this, the Wang group found that Xenium was able to clearly separate lineages denoted by non-overlapping marker genes (CD19 and CD3e for B and T cells, CD8 and CD4 for T cell subsets, and CD3E and EPCAM for epithelial cancer). While MERSCOPE was also able to resolve these lineages, CosMx, though able to distinguish EPCAM versus CD3E, could not differentiate between the other pairs (which the authors hypothesized may be due to false positives or cell segmentation errors, given the low counts for these immune genes).
Wang et al. found that Xenium consistently captured more cell types than CosMx (Figure 1). Xenium identified 9 distinct cell types per breast, lung, and breast cancer sample. CosMx only resolved 6, 6, and 8 cell types, respectively, and was unable to resolve all known major cell types in breast and lung. MERSCOPE identified 6 cell types in breast cancer, and researchers noted, “...a clearer one-to-one mapping between MERSCOPE and Xenium clusters than Xenium and CosMx clusters.”
Using snRNA-seq as their ground truth, Cook et al. found a greater-than-expected proportion of macrophage and fibroblast markers in muscle cells in Xenium data. Similarly, endothelial, muscle, and fibroblast markers were seen in pericytes, which they hypothesized was due to neighboring cell contamination. In CosMx data, they noted reduced specificity of marker genes and stated that it could be due to, “...a lower abundance of on-target markers and higher noise levels, rather than segmentation errors.”
Cook et al. attempted to detect CD8 T cells using both Xenium and CosMx. While Xenium identified a corresponding population, this population was absent in the CosMx data (Figure 2A). Additionally, while markers of lymphocytes—small cells with low RNA content—exhibited punctate expression in tissue on Xenium, in CosMx SMI these markers were distributed throughout the tissue and insufficiently expressed for cell characterization (Figure 2B).
Finally, Cervilla et al. noted that the gross characterization of cell types was broadly similar between platforms. When looking at spatial distribution of cell types, they noticed that—while both technologies were able to characterize areas that were rich in CD8-positive T cells—only Xenium was able to demonstrate clear spatial clustering of this cell type, likely due to the increased noise in CosMx versus Xenium (Figure 2C). Finally, they found that there was a correlation between the total amount of tissue analyzed and how well it recapitulated the cell composition of the entire tissue.
What was the authors’ takeaway?
“Xenium does seem to have a lower false discovery rate than CosMx, particularly in lowly expressed genes. We also showed that this affects the actual spatial distribution of the signal of many genes, as reflected by their Moran’s I, which tends to be lower in CosMx than in Xenium.” –Cervilla et al.
*Note: When these studies were performed, the default Xenium cell segmentation was a 15 µm nuclear expansion. Since then, this distance has been reduced to 5 µm, and Xenium now has the option for morphology-driven cell segmentation with the Xenium Multimodal Cell Segmentation Kit.
Why does it matter?
Specificity—defined in this case as the proportion of called transcripts being true biological transcripts rather than noise—acts as the basis for accurate interpretation of sensitivity (see below), dynamic range, and other analytical performance metrics.
How did the authors use it in these studies?
In addition to probes targeting genes of interest, both Xenium and CosMx include negative control probes. In Cook et al. the authors assessed specificity by comparing the ratio of the median number of transcripts detected from gene-targeting probes to the median number of transcripts from negative control probes.
Wang et al. noted that, since Vizgen MERSCOPE does not incorporate negative control probes by default, they had to take a different approach. In order to determine specificity across all three platforms, they first compared on-target transcript counts to the fraction of negative control barcodes (e.g., detectable barcodes that do not correspond to any gene probe in their specific study).
Wang et al. also leveraged the fact that Xenium and CosMx incorporate both decoding controls and negative control probes to calculate the false discovery rate (FDR) for each platform. They noted that, while the relative number of controls and barcodes introduced bias in their earlier calculations, the FDR normalizes for this confounding factor.
Cervilla et al. took a similar approach to Wang et al. and calculated the ratio of on-target gene probe counts to negative control probes and the FDR focused on genes shared across both Xenium and CosMx.
What did they find?
In Cook et al, they found that the median of gene-detecting transcripts to median negative control probes in CosMx data was 4.7:1; in Xenium data, however, this ratio was 372:1 (Figure 3).
Similarly, Wang et al. found that in all three tissue types tested, Xenium and MERSCOPE showed higher proportions of on-target calls than CosMx. Xenium also demonstrated a consistently lower FDR (e.g., higher specificity) than both CosMx and MERSCOPE across every tissue and panel type (Figure 4).
Consistent with the two prior studies, Cervilla et al. observed that Xenium had a higher ratio of gene-detecting probes to negative control probes (450:1) than CosMx (3:1). When examining FDR, which normalizes for the relative number of controls and barcodes in each technology, they found that Xenium had roughly 200-fold lower FDR (0.01%) than CosMx (2%), indicating higher specificity (Figure 5).
the Xenium breast having 14.6-fold more counts than the CosMx multi-tissue data sets. The Xenium multi-tissue panel data showed a slightly smaller difference, with 12.3-fold higher expression on the same genes, while the lung panel, which was acquired closest in time following slicing, also displayed a median of 14.0-fold higher expression.”
What was the authors’ takeaway?
“Xenium and MERSCOPE also showed consistently high specificity across tissue types. CosMx displayed a characteristic upward curve when compared to MERSCOPE or Xenium on a gene-by-gene basis, indicating more frequent calls in the lower expression regime. This, coupled with the lower specificity across several tissues for CosMx and the high false discovery rate, suggest that CosMx is prone to errors in calling lowly expressed genes.” –Wang et al.
Why does it matter?
Using a well-established orthogonal technology—such as RNA-seq, single nuclei RNA-seq (snRNA-seq), or NGS-based spatial transcriptomics—allows researchers to ensure relative gene expression levels, overall sensitivity, and the established assay dynamic range are reflective of true biology and not simply noise.
How did the authors use it in these studies?
Cook et al. performed snRNA-seq on tissue sections adjacent to those used for the Xenium and CosMx analyses for their orthogonal validation.
Wang et al. generated pseudobulk gene expression data from the tissue microarrays run on each platform, then compared this to bulk RNA-seq data from cancer (TCGA) and normal (GTEx) tissue samples that matched the cancer and tissue types used in the spatial imaging assays.
Cervilla et al. used the Visium Spatial Gene Expression platform, coupled with the Visium CytAssist instrument, to generate NGS-based spatial transcriptomics data and compared this to Xenium and CosMx data.
What did they find?
The Cook group found that Xenium data correlated well with snRNA-seq data across the whole dynamic range, and also offered a greater dynamic range and increased sensitivity over CosMx (Figure 6, top panel). In examining the CosMx data, the authors noted, "...a noticeable inflation of lowly expressed genes, contributing to its compressed dynamic range."
Wang et al. highlighted good correlation between Xenium, MERSCOPE, and CosMx and bulk RNA-seq data (Figure 6, bottom panel). Their data also demonstrated the same potential issue with low-expression transcripts, reporting, “...the presence of a characteristic upswing for CosMx, even when comparing to orthogonal data, further shows that there is a higher false positive rate for lower expression level genes in the CosMx data.”
Wang et al. also made a point of comparing sensitivity between the panels. Specifically, they found that, “Xenium showed 2.6-fold higher median expression with the breast panels (10 μm) than MERSCOPE, and 13.6-fold higher median expression with the lung panels (5 μm).” Xenium performed even higher compared to CosMx, with Wang et al. stating that, “Xenium consistently showed higher expression levels on the same genes than CosMx in the tumor TMA, with the Xenium breast having 14.6-fold more counts than the CosMx multi-tissue data sets. The Xenium multi-tissue panel data showed a slightly smaller difference, with 12.3-fold higher expression on the same genes, while the lung panel, which was acquired closest in time following slicing, also displayed a median of 14.0-fold higher expression.”
While Cervilla et al. showed exceptionally high correlation for mean UMI counts between Visium and Xenium (R = 0.96), the correlation was lower between Visium and CosMx (R = 0.75) (Figure 7, top panel). They remarked that the lower correlation in CosMx, “...was largely driven by lowly expressed genes (average UMI count per gene < 0.1), as they [CosMx] have an average UMI count virtually indistinguishable from the negative probes.” Consistent with these findings, they observed that—after comparing pseudobulk counts (pseudocounts) from tissue areas captured in both Visium, Xenium, and/or CosMx—Xenium had higher spatial correlation with Visium than CosMx did.
What was the authors’ takeaway?
“We then compared transcript abundances from the two platforms to the matched snPATHO-seq. Although these counts were averaged across the entire population, we observed that the Xenium data correlated well with the snPATHO-seq data across its entire range of detection. In contrast, the CosMx data exhibited a noticeable inflation of lowly expressed genes, contributing to its compressed dynamic range. As a result, genes varying three orders of magnitude in the snPATHO-seq data had similar detection in the CosMx data. This is compounded by a reduced sensitivity for highly expressed genes, which were recovered at lower levels than observed in the snPATHO-seq.” –Cook et al.
Why does it matter?
The total number of genes (plexity) on a panel can bias some metrics, including the total number of unique genes detected per cell and the total number of transcripts detected (both in single cells and across the tissue), especially if the authors do not account for noise. Analyzing the same complement(s) of genes across platforms, as well as transcripts above noise, helps provide a more direct comparison of performance.
How did the authors use it in these studies?
Cook et al. explicitly noted the number of overlapping genes (125) in addition to total panel size (377 genes for Xenium and 960 genes for CosMx). When comparing metrics that may be influenced by plexity size (such as number of unique genes detected per cell and median transcripts per gene), they provided the overall data, data normalized by number of targets, and data compared across only shared genes.
In Wang et al., they compared multiple Xenium and MERSCOPE tissue panels to CosMx, noting that—of the six panels they focused on—all of them shared > 94 genes, with a high degree of similarity between the Xenium and MERSCOPE panels. Similar to the Cook study, this group also reported metrics from both total and shared genes.
Cervilla et al. compared the Xenium Human Multi-Tissue and Cancer panel (377 genes) to the CosMx 1,000-gene panel. They primarily focused only on genes shared between the two panels (125 genes), but did also report metrics from both shared and total genes.
What did they find?
As expected, given its higher plexity, the CosMx panel identified a greater number of unique genes per cell than Xenium in the Cook study. However, when examining the median per-gene number of transcripts detected across all genes, Xenium resolved roughly twice as many transcripts per gene as CosMx. Consistent with these findings, Xenium also resolved three times as many transcripts per cell as CosMx when limited to only the 125 genes shared between panels (Figure 8A).
Wang et al. showed consistent results with the Cook study. When comparing Xenium, MERSCOPE, and CosMx, the higher-plex CosMx panel detected the highest absolute number of unique genes in 14 of the 20 tested tissue types when considering all panel genes. However, Xenium not only detected a greater absolute number of genes in the 6 other sample types, but the greatest percentage of on-panel genes across all 20 samples, followed by either MERSCOPE or CosMx (depending on sample type; Figure 8B).
Cervilla et al. was concordant with both Cook et al. and Wang et al. Averaged across all tissue types analyzed, Xenium and CosMx gave a similar number of genes detected per cell (14 and 13, respectively; Figure 8C). Similarly, Xenium provided a greater total number of gene counts per cell than CosMx (37 and 26, respectively). However, the authors noted that, “CosMx has an important artifact on its FOVs, where cells that are at its border have an order of magnitude less counts than cells in the middle of the FOV,” therefore the authors calculated the metrics with these cells excluded.
What was the authors’ takeaway?
“When this analysis was also restricted to shared genes, we also found that Xenium consistently had higher expression levels across each tissue type, with no clear differences between performance on either tumor or normal tissue.” –Wang et al.
It is important to not just note the findings from these three independent studies, but to highlight how concordant the findings were. In spite of using different tools for orthogonal benchmarking, different tissue types, and different approaches, all three studies found that:
Additionally, all three groups have made their data publicly available for reanalysis and incorporation into future comparison studies.
A well-conducted study will highlight limitations in their experimental designs. For example:
Cook et al. acknowledged a limited sample size, consisting of a single tissue type from a single patient (though the authors are following up with a much larger study). This is an important consideration given the potential impact of tissue type and complexity on assay performance. Furthermore, this study—as well as Wang et al.—focused on FFPE tissue and did not assess performance on fresh frozen samples (another commonly used preservation method for spatial assays).
Wang et al. acknowledged that unequal times elapsed between sectioning and imaging their tissues. While they attempted to pair timepoints for each technology, they initially sectioned MERSCOPE tissue thinner than manufacturer’s recommendations, resulting in a potentially unequal comparison between Xenium and MERSCOPE at matched timepoints (though they ran another set of samples that followed the manufacturer’s recommended thickness). We focused much of the comparative data in this piece on the samples that showed minimal variance. They also noted that making the MERSCOPE panel compatible with all tissues necessitated the removal of several genes, potentially impacting its performance versus Xenium.
Cervilla et al. acknowledged how quickly spatial transcriptomics moves, and that comparison data must be looked at in the context of the study date. As examples, they pointed out that Xenium now offers multimodal membrane staining-based cell segmentation as well as 5,000-gene panels, which were not available at the time of their study. Additionally, they pointed out prior studies (namely Cook et al. and Wang et al.) had not used matching Visium data, so they can only directly compare the performance of Xenium and CosMx. Their group also acknowledged that, while the original Visium had comparatively low resolution (especially in the context of imaging-based spatial transcriptomics platforms), the new Visium HD platform should overcome any resolution issues.
This article takes a deep dive into three recent independent benchmarking studies that used differing methods but reached similar conclusions. Two other recent studies have also been conducted.
The first is an independent study from Hartman and Satija at the New York Genome Center. Their work compared 6 different single cell spatial imaging technologies (including Xenium and MERSCOPE) in mouse brain using a Baysor-based cell segmentation strategy. The authors pointed out several limitations in their study, specifically that they focused on a single anatomical feature (cortex) in a single tissue type (fresh frozen mouse brain). This work used publicly available datasets for both single cell spatial imaging platforms and scRNA-seq analyses, not side-by-side comparisons from the same mice or neuroanatomical location.
The second is a NanoString-sponsored study from the Dulai group at Northwestern University. Their study compares the performance of Xenium and CosMx in FFPE ileal and rectal human biopsies. While the webinar did not touch on limitations or caveats in this study, several potential points to consider are:
Single cell spatial imaging technologies are rapidly evolving with advances in plexity, analytical methods, cell segmentation modalities, software, and more. These studies represent what was available to the authors at the time of the study, and understanding the advances that have been made since publication is critical to ensure you choose the platform that best fits your needs.
Looking to go deeper with comparisons between single cell spatial imaging platforms? Watch David Cook and Dr. Luciano Martelotto’s webinar below!