Independent technology comparisons

Benchmarking comparisons: Highlighting spatial imaging comparison studies

Block 1
editable text

When considering a single cell spatial imaging platform, independent researcher-led studies that compare assay performance can be helpful. Here, we highlight three recent studies (Cook et al.,  Wang et al., and Cervilla et al.) that compare 10x Genomics’ Xenium In Situ with the NanoString CosMx Spatial Molecular Imager and/or Vizgen MERSCOPE platforms. 

These studies, conducted by separate independent research groups, are deep dives on the different single cell spatial imaging platforms with a focus on:

  • Comparing transcript detection specificity
  • Validating assay performance using orthogonal technologies (RNA-seq and scRNA-seq) 
  • Performing apples-to-apples comparisons using shared genes
  • Evaluating cell clustering, segmentation, and spatial distribution of transcripts
  • Making their data publicly available for reanalysis

See a summary of key findings from their studies below, or jump down for a more in-depth overview of what the authors did and what they found!

Key findings

In these publications, the authors demonstrated that:

Xenium was better able to spatially resolve and discriminate cell types than competitors

  • Cook et al. showed that they were able to confidently detect both CD8-positive T cells and lymphocytes with Xenium, but unable to do so with CosMx
  • Wang et al. found that:
    • Both Xenium and MERSCOPE clearly separated 6 lineages denoted by non-overlapping marker genes, but CosMx could only distinguish 2
    • Xenium consistently identified more cell types than CosMx and MERSCOPE in several tissue types
  • Cervilla et al. noted that:
    • Xenium demonstrated greater spatial clustering of cell types, and only Xenium was able to clearly cluster CD8-positive T cells
    • They had to exclude cells at the CosMx field of view (FOV) edge from analysis because, “CosMx has an important artifact on its FOVs, where cells that are at its border have an order of magnitude less counts than cells in the middle of the FOV.”

Xenium showed greater specificity

  • Cook et al. observed a 372:1 and 4.7:1 ratio of gene-detecting transcripts to negative control probes of for Xenium and for CosMx, respectively
  • Wang et al. showed Xenium was consistently more specific (measured by false discovery rate (FDR)) than MERSCOPE and CosMx in every tissue and panel type tested
  • Cervilla et al. determined Xenium was more specific than CosMx, both via FDR (Xenium = 0.01%; CosMx = 2%) and by ratio of gene probe versus negative control probe counts (Xenium = 450:1; CosMx = 3:1)

Xenium exhibited greater consistency with orthogonal benchmarking

  • Cook et al. showed that Xenium correlated well with snRNA-seq, but CosMx showed inflated counts of low-expression genes
  • Wang et al. noted that Xenium showed between 2.6–13.6-fold higher gene expression versus MERSCOPE and between 12.3–14.6-fold higher expression versus CosMx
  • Cervilla et al. found that, when compared to Visium, Xenium showed a much higher correlation of mean gene expression (R = 0.96) than CosMx (R = 0.75) due to inflated low-expression gene counts in CosMx

Xenium detected more transcripts on shared genes

  • Cook et al. observed that Xenium detected ~2x as many transcripts per gene than CosMx; when limited to genes shared between platforms, Xenium detected ~3x more transcripts per cell
  • Wang et al. saw that, despite lower plexity than CosMx, Xenium detected more total unique genes in 6 of 20 tissue types than MERSCOPE and CosMx, and also detected the greatest percentage of on-panel genes 
  • Cervilla et al. found that, when limited to shared genes, Xenium detected more gene counts per cell than CosMx (Xenium = 37; CosMx = 26)

Compared segmentation, cell clustering, and spatial organization

Why does it matter?

Single cell spatial imaging technologies require both reliable cell segmentation and transcript detection to ensure you’re capturing single cells and assigning correct transcripts to them. 

How did the authors use it in these studies? 

Taking advantage of its cellular heterogeneity, Cook et al. focused on the stroma and examined epithelial, fibroblast, muscle, and macrophage markers (among others) in segmented cells. They characterized spatial organization based on characterized cell types in tissue. Finally, they attempted to detect a rare cell type (CD8 T cells, defined by coexpression of CD3, CD8A, and GZMA).

Wang et al. first examined the expression of pairs of non-overlapping marker genes, rationalizing that well-segmented cells should have a low proportion of cells expressing both genes in a pair. They also looked at the cell types each platform was able to resolve in several tissue types based on canonical markers.

Cervilla et al. characterized the relative abundance of cell types in each platform, as well as whether the cell types characterized in each technology exhibited patterns of spatial clustering. They also examined the accuracy of cell-typing composition as a function of total tissue area analyzed.

What did they find?

Both studies highlighted that the nuclear boundary expansion used by Xenium for cell segmentation* resulted in cells that were consistently larger than CosMx and MERSCOPE. In spite of this, the Wang group found that Xenium was able to clearly separate lineages denoted by non-overlapping marker genes (CD19 and CD3e for B and T cells, CD8 and CD4 for T cell subsets, and CD3E and EPCAM for epithelial cancer). While MERSCOPE was also able to resolve these lineages, CosMx, though able to distinguish EPCAM versus CD3E, could not differentiate between the other pairs (which the authors hypothesized may be due to false positives or cell segmentation errors, given the low counts for these immune genes).

Wang et al. found that Xenium consistently captured more cell types than CosMx (Figure 1). Xenium identified 9 distinct cell types per breast, lung, and breast cancer sample. CosMx only resolved 6, 6, and 8 cell types, respectively, and was unable to resolve all known major cell types in breast and lung. MERSCOPE identified 6 cell types in breast cancer, and researchers noted, “...a clearer one-to-one mapping between MERSCOPE and Xenium clusters than Xenium and CosMx clusters.”

Using snRNA-seq as their ground truth, Cook et al. found a greater-than-expected proportion of macrophage and fibroblast markers in muscle cells in Xenium data. Similarly, endothelial, muscle, and fibroblast markers were seen in pericytes, which they hypothesized was due to neighboring cell contamination. In CosMx data, they noted reduced specificity of marker genes and stated that it could be due to, “...a lower abundance of on-target markers and higher noise levels, rather than segmentation errors.”

Cook et al. attempted to detect CD8 T cells using both Xenium and CosMx. While Xenium identified a corresponding population, this population was absent in the CosMx data (Figure 2A). Additionally, while markers of lymphocytes—small cells with low RNA content—exhibited punctate expression in tissue on Xenium, in CosMx SMI these markers were distributed throughout the tissue and insufficiently expressed for cell characterization (Figure 2B). 

Finally, Cervilla et al. noted that the gross characterization of cell types was broadly similar between platforms. When looking at spatial distribution of cell types, they noticed that—while both technologies were able to characterize areas that were rich in CD8-positive T cells—only Xenium was able to demonstrate clear spatial clustering of this cell type, likely due to the increased noise in CosMx versus Xenium (Figure 2C). Finally, they found that there was a correlation between the total amount of tissue analyzed and how well it recapitulated the cell composition of the entire tissue.

What was the authors’ takeaway?

“Xenium does seem to have a lower false discovery rate than CosMx, particularly in lowly expressed genes. We also showed that this affects the actual spatial distribution of the signal of many genes, as reflected by their Moran’s I, which tends to be lower in CosMx than in Xenium.” –Cervilla et al.

*Note: When these studies were performed, the default Xenium cell segmentation was a 15 µm nuclear expansion. Since then, this distance has been reduced to 5 µm, and Xenium now has the option for morphology-driven cell segmentation with the Xenium Multimodal Cell Segmentation Kit.

Compared assay specificity to guide interpretation of results

Why does it matter? 

Specificity—defined in this case as the proportion of called transcripts being true biological transcripts rather than noise—acts as the basis for accurate interpretation of sensitivity (see below), dynamic range, and other analytical performance metrics.

How did the authors use it in these studies? 

In addition to probes targeting genes of interest, both Xenium and CosMx include negative control probes. In Cook et al. the authors assessed specificity by comparing the ratio of the median number of transcripts detected from gene-targeting probes to the median number of transcripts from negative control probes.

Wang et al. noted that, since Vizgen MERSCOPE does not incorporate negative control probes by default, they had to take a different approach. In order to determine specificity across all three platforms, they first compared on-target transcript counts to the fraction of negative control barcodes (e.g., detectable barcodes that do not correspond to any gene probe in their specific study). 

Wang et al. also leveraged the fact that Xenium and CosMx incorporate both decoding controls and negative control probes to calculate the false discovery rate (FDR) for each platform. They noted that, while the relative number of controls and barcodes introduced bias in their earlier calculations, the FDR normalizes for this confounding factor.

Cervilla et al. took a similar approach to Wang et al. and calculated the ratio of on-target gene probe counts to negative control probes and the FDR focused on genes shared across both Xenium and CosMx. 

What did they find? 

In Cook et al, they found that the median of gene-detecting transcripts to median negative control probes in CosMx data was 4.7:1; in Xenium data, however, this ratio was 372:1 (Figure 3).

Similarly, Wang et al. found that in all three tissue types tested, Xenium and MERSCOPE showed higher proportions of on-target calls than CosMx. Xenium also demonstrated a consistently lower FDR (e.g., higher specificity) than both CosMx and MERSCOPE across every tissue and panel type (Figure 4).

Consistent with the two prior studies, Cervilla et al. observed that Xenium had a higher ratio of gene-detecting probes to negative control probes (450:1) than CosMx (3:1). When examining FDR, which normalizes for the relative number of controls and barcodes in each technology, they found that Xenium had roughly 200-fold lower FDR (0.01%) than CosMx (2%), indicating higher specificity (Figure 5).

the Xenium breast having 14.6-fold more counts than the CosMx multi-tissue data sets. The Xenium multi-tissue panel data showed a slightly smaller difference, with 12.3-fold higher expression on the same genes, while the lung panel, which was acquired closest in time following slicing, also displayed a median of 14.0-fold higher expression.”

What was the authors’ takeaway?

“Xenium and MERSCOPE also showed consistently high specificity across tissue types. CosMx displayed a characteristic upward curve when compared to MERSCOPE or Xenium on a gene-by-gene basis, indicating more frequent calls in the lower expression regime. This, coupled with the lower specificity across several tissues for CosMx and the high false discovery rate, suggest that CosMx is prone to errors in calling lowly expressed genes.” –Wang et al.

Established ground truths by benchmarking with orthogonal technologies

Why does it matter?

Using a well-established orthogonal technology—such as RNA-seq, single nuclei RNA-seq (snRNA-seq), or NGS-based spatial transcriptomics—allows researchers to ensure relative gene expression levels, overall sensitivity, and the established assay dynamic range are reflective of true biology and not simply noise.

How did the authors use it in these studies?

Cook et al. performed snRNA-seq on tissue sections adjacent to those used for the Xenium and CosMx analyses for their orthogonal validation. 

Wang et al. generated pseudobulk gene expression data from the tissue microarrays run on each platform, then compared this to bulk RNA-seq data from cancer (TCGA) and normal (GTEx) tissue samples that matched the cancer and tissue types used in the spatial imaging assays.

Cervilla et al. used the Visium Spatial Gene Expression platform, coupled with the Visium CytAssist instrument, to generate NGS-based spatial transcriptomics data and compared this to Xenium and CosMx data.

What did they find?

The Cook group found that Xenium data correlated well with snRNA-seq data across the whole dynamic range, and also offered a greater dynamic range and increased sensitivity over CosMx (Figure 6, top panel). In examining the CosMx data, the authors noted, "...a noticeable inflation of lowly expressed genes, contributing to its compressed dynamic range."

Wang et al. highlighted good correlation between Xenium, MERSCOPE, and CosMx and bulk RNA-seq data (Figure 6, bottom panel). Their data also demonstrated the same potential issue with low-expression transcripts, reporting, “...the presence of a characteristic upswing for CosMx, even when comparing to orthogonal data, further shows that there is a higher false positive rate for lower expression level genes in the CosMx data.” 

Wang et al. also made a point of comparing sensitivity between the panels. Specifically, they found that, “Xenium showed 2.6-fold higher median expression with the breast panels (10 μm) than MERSCOPE, and 13.6-fold higher median expression with the lung panels (5 μm).” Xenium performed even higher compared to CosMx, with Wang et al. stating that, “Xenium consistently showed higher expression levels on the same genes than CosMx in the tumor TMA, with the Xenium breast having 14.6-fold more counts than the CosMx multi-tissue data sets. The Xenium multi-tissue panel data showed a slightly smaller difference, with 12.3-fold higher expression on the same genes, while the lung panel, which was acquired closest in time following slicing, also displayed a median of 14.0-fold higher expression.”

While Cervilla et al. showed exceptionally high correlation for mean UMI counts between Visium and Xenium (R = 0.96), the correlation was lower between Visium and CosMx (R = 0.75) (Figure 7, top panel). They remarked that the lower correlation in CosMx, “...was largely driven by lowly expressed genes (average UMI count per gene < 0.1), as they [CosMx] have an average UMI count virtually indistinguishable from the negative probes.” Consistent with these findings, they observed that—after comparing pseudobulk counts (pseudocounts) from tissue areas captured in both Visium, Xenium, and/or CosMx—Xenium had higher spatial correlation with Visium than CosMx did.

What was the authors’ takeaway?

“We then compared transcript abundances from the two platforms to the matched snPATHO-seq. Although these counts were averaged across the entire population, we observed that the Xenium data correlated well with the snPATHO-seq data across its entire range of detection. In contrast, the CosMx data exhibited a noticeable inflation of lowly expressed genes, contributing to its compressed dynamic range. As a result, genes varying three orders of magnitude in the snPATHO-seq data had similar detection in the CosMx data. This is compounded by a reduced sensitivity for highly expressed genes, which were recovered at lower levels than observed in the snPATHO-seq.” –Cook et al.

Compared performance using overlapping genes

Why does it matter?

The total number of genes (plexity) on a panel can bias some metrics, including the total number of unique genes detected per cell and the total number of transcripts detected (both in single cells and across the tissue), especially if the authors do not account for noise. Analyzing the same complement(s) of genes across platforms, as well as transcripts above noise, helps provide a more direct comparison of performance.

How did the authors use it in these studies?

Cook et al. explicitly noted the number of overlapping genes (125) in addition to total panel size (377 genes for Xenium and 960 genes for CosMx). When comparing metrics that may be influenced by plexity size (such as number of unique genes detected per cell and median transcripts per gene), they provided the overall data, data normalized by number of targets, and data compared across only shared genes.

In Wang et al., they compared multiple Xenium and MERSCOPE tissue panels to CosMx, noting that—of the six panels they focused on—all of them shared > 94 genes, with a high degree of similarity between the Xenium and MERSCOPE panels. Similar to the Cook study, this group also reported metrics from both total and shared genes.

Cervilla et al. compared the Xenium Human Multi-Tissue and Cancer panel (377 genes) to the CosMx 1,000-gene panel. They primarily focused only on genes shared between the two panels (125 genes), but did also report metrics from both shared and total genes.

What did they find?

As expected, given its higher plexity, the CosMx panel identified a greater number of unique genes per cell than Xenium in the Cook study. However, when examining the median per-gene number of transcripts detected across all genes, Xenium resolved roughly twice as many transcripts per gene as CosMx. Consistent with these findings, Xenium also resolved three times as many transcripts per cell as CosMx when limited to only the 125 genes shared between panels (Figure 8A).

Wang et al. showed consistent results with the Cook study. When comparing Xenium, MERSCOPE, and CosMx, the higher-plex CosMx panel detected the highest absolute number of unique genes in 14 of the 20 tested tissue types when considering all panel genes. However, Xenium not only detected a greater absolute number of genes in the 6 other sample types, but the greatest percentage of on-panel genes across all 20 samples, followed by either MERSCOPE or CosMx (depending on sample type; Figure 8B).

Cervilla et al. was concordant with both Cook et al. and Wang et al. Averaged across all tissue types analyzed, Xenium and CosMx gave a similar number of genes detected per cell (14 and 13, respectively; Figure 8C). Similarly, Xenium provided a greater total number of gene counts per cell than CosMx (37 and 26, respectively). However, the authors noted that, “CosMx has an important artifact on its FOVs, where cells that are at its border have an order of magnitude less counts than cells in the middle of the FOV,” therefore the authors calculated the metrics with these cells excluded.

What was the authors’ takeaway?

“When this analysis was also restricted to shared genes, we also found that Xenium consistently had higher expression levels across each tissue type, with no clear differences between performance on either tumor or normal tissue.” –Wang et al.

Highlighting the concordance of these findings

It is important to not just note the findings from these three independent studies, but to highlight how concordant the findings were. In spite of using different tools for orthogonal benchmarking, different tissue types, and different approaches, all three studies found that:

  • Xenium exhibited higher specificity than CosMx (measured either by percentage of on-target genes versus negative control probes or FDR)
  • Xenium correlated well with snRNA-seq, RNA-seq, and NGS-based spatial transcriptomics across its entire dynamic range, while CosMx had inflated counts in lower-expression transcripts (which was explicitly highlighted in all three manuscripts)
  • Xenium consistently detected more transcripts than CosMx on genes shared between both panels 
  • Xenium was able to detect cell types that CosMx did not, including lymphocytes and rare/difficult-to-detect cell types

Additionally, all three groups have made their data publicly available for reanalysis and incorporation into future comparison studies.

Touching on comparison study limitations

A well-conducted study will highlight limitations in their experimental designs. For example:

Cook et al. acknowledged a limited sample size, consisting of a single tissue type from a single patient (though the authors are following up with a much larger study). This is an important consideration given the potential impact of tissue type and complexity on assay performance. Furthermore, this study—as well as Wang et al.—focused on FFPE tissue and did not assess performance on fresh frozen samples (another commonly used preservation method for spatial assays). 

Wang et al. acknowledged that unequal times elapsed between sectioning and imaging their tissues. While they attempted to pair timepoints for each technology, they initially sectioned MERSCOPE tissue thinner than manufacturer’s recommendations, resulting in a potentially unequal comparison between Xenium and MERSCOPE at matched timepoints (though they ran another set of samples that followed the manufacturer’s recommended thickness). We focused much of the comparative data in this piece on the samples that showed minimal variance. They also noted that making the MERSCOPE panel compatible with all tissues necessitated the removal of several genes, potentially impacting its performance versus Xenium.

Cervilla et al. acknowledged how quickly spatial transcriptomics moves, and that comparison data must be looked at in the context of the study date. As examples, they pointed out that Xenium now offers multimodal membrane staining-based cell segmentation as well as 5,000-gene panels, which were not available at the time of their study. Additionally, they pointed out prior studies (namely Cook et al. and Wang et al.) had not used matching Visium data, so they can only directly compare the performance of Xenium and CosMx. Their group also acknowledged that, while the original Visium had comparatively low resolution (especially in the context of imaging-based spatial transcriptomics platforms), the new Visium HD platform should overcome any resolution issues.

Capturing a more complete picture: Acknowledging other comparison studies

This article takes a deep dive into three recent independent benchmarking studies that used differing methods but reached similar conclusions. Two other recent studies have also been conducted.

The first is an independent study from Hartman and Satija at the New York Genome Center. Their work compared 6 different single cell spatial imaging technologies (including Xenium and MERSCOPE) in mouse brain using a Baysor-based cell segmentation strategy. The authors pointed out several limitations in their study, specifically that they focused on a single anatomical feature (cortex) in a single tissue type (fresh frozen mouse brain). This work used publicly available datasets for both single cell spatial imaging platforms and scRNA-seq analyses, not side-by-side comparisons from the same mice or neuroanatomical location.

The second is a NanoString-sponsored study from the Dulai group at Northwestern University. Their study compares the performance of Xenium and CosMx in FFPE ileal and rectal human biopsies. While the webinar did not touch on limitations or caveats in this study, several potential points to consider are:

  • No data was provided for measuring assay specificity and/or noise (a relevant consideration given the overall lower specificity and “upward curve” observed in CosMx by Cook et al. and Wang et al.)
  • While a public scRNA-seq dataset was used to annotate cells, no orthogonal technology was used to provide a “ground truth” for comparing assay performance between both platforms and individual samples
  • The authors state that CosMx provides less variability in cell-type proportions than Xenium, but do not provide data on the accuracy of called cell types for each platform
  • No visualizations of cell clustering or cell-type annotation in UMAP or spatial distribution is shown, making it challenging to determine whether called cell types cluster together or are artifacts of noise
  • The authors state that CosMx performs consistently regardless of RNA quality; however, only 5 of 16 samples are shown, and no acknowledgement of a negative correlation with higher-quality samples in CosMx data is made
  • A claim is made that CosMx measures ~6 times more genes per cell than Xenium; this claim is not made by the webinar speaker and is derived from all genes on each panel (rather than overlapping genes) with no mention of the percentage of genes above noise 

Single cell spatial imaging technologies are rapidly evolving with advances in plexity, analytical methods, cell segmentation modalities, software, and more. These studies represent what was available to the authors at the time of the study, and understanding the advances that have been made since publication is critical to ensure you choose the platform that best fits your needs.

Looking to go deeper with comparisons between single cell spatial imaging platforms? Watch David Cook and Dr. Luciano Martelotto’s webinar below!


Talk to a specialist