Independent technology comparisons

Benchmarking comparisons: Highlighting spatial imaging comparison studies

 
 
Block 1
editable text

When considering a single cell spatial imaging platform, independent researcher-led studies that compare assay performance can be helpful. Here, we highlight two recent studies (Cook et al. and Wang et al.) that compare 10x Genomics’ Xenium In Situ with the NanoString CosMx Spatial Molecular Imager and/or Vizgen MERSCOPE platforms. 

These studies, conducted by separate independent research groups, performed deep dives on the different single cell spatial imaging platforms with a focus on:

  • Comparing transcript detection specificity
  • Validating assay performance using orthogonal technologies (RNA-seq and scRNA-seq) 
  • Performing apples-to-apples comparisons using shared genes
  • Evaluating cell clustering, segmentation, and spatial distribution of transcripts
  • Making their data publicly available for reanalysis

See a summary of key findings from their studies below, or jump down for a more in-depth overview of what the authors did and what they found!

Key findings

In these publications, the authors demonstrated that:

Xenium showed greater specificity

  • Cook et al. observed a ratio of gene-detecting transcripts to negative control probes of 372:1 for Xenium & 4.7:1 for CosMx
  • Wang et al. showed Xenium was consistently more specific (measured by false discovery rate) than MERSCOPE and CosMx in every tissue and panel type tested

Xenium exhibited greater consistency with orthogonal benchmarking

  • Cook et al. showed that Xenium correlated well with snRNA-seq, but CosMx showed inflated counts of low-expression genes
  • Wang et al. noted that Xenium showed between 2.6—13.6-fold higher gene expression versus MERSCOPE & between 12.3—14.6-fold higher expression versus CosMx

Xenium detected more transcripts on shared genes

  • Cook et al. observed Xenium detected ~2x as many transcripts per gene than CosMx; when limited to genes shared between platforms, Xenium detected ~3x more transcripts per cell
  • Wang et al. saw that, despite lower plexity than CosMx, Xenium detected more total unique genes in 6 of 20 tissue types than MERSCOPE and CosMx, and also detected the greatest percentage of on-panel genes 

Xenium was better able to detect and discriminate cell types than competitors

  • Cook et al. showed that they were able to confidently detect both CD8-positive T cells and lymphocytes in Xenium, but were unable to in CosMx
  • Wang et al. found that:
    • Both Xenium and MERSCOPE clearly separated 6 lineages denoted by non-overlapping marker genes, but CosMx could only distinguish 2
    • Xenium consistently identified more cell types than CosMx & MERSCOPE in several tissue types

Compared assay specificity to guide interpretation of results

Why does it matter? 

Specificity—defined in this case as the proportion of called transcripts being true biological transcripts rather than noise—acts as the basis for accurate interpretation of sensitivity (see below), dynamic range, and other analytical performance metrics.

How did the authors use it in these studies? 

In addition to probes targeting genes of interest, both Xenium In Situ and CosMx SMI include negative control probes. In Cook et al. the authors assessed specificity by comparing the ratio of the median number of transcripts detected from gene-targeting probes compared to the median number of transcripts from negative control probes.

Wang et al. noted that, since Vizgen MERSCOPE does not incorporate negative control probes by default, they had to take a different approach. In order to determine specificity across all three platforms, they first compared on-target transcript counts to the fraction of negative control barcodes (e.g. detectable barcodes that do not correspond to any gene probe in their specific study). 

Wang et al. also leveraged the fact that Xenium and CosMx incorporate both decoding controls and negative control probes to calculate the false discovery rate (FDR) for each platform. They noted that, while the relative number of controls and barcodes introduced bias in their earlier calculations, the FDR normalizes for this confounding factor.

What did they find? 

In Cook et al, they found that the median of gene-detecting transcripts to median negative control probes in CosMx SMI was 4.7:1; in Xenium, however, this ratio was 372:1 (Figure 1).

Similarly, Wang et al. found that in all three tissue types tested, Xenium and MERSCOPE showed higher proportions of on-target calls than CosMx. Xenium also demonstrated a consistently lower FDR (e.g., higher specificity) than both CosMx and MERSCOPE across every tissue and panel type (Figure 2).




What was the authors’ takeaway?

“Xenium and MERSCOPE also showed consistently high specificity across tissue types. CosMx displayed a characteristic upward curve when compared to MERSCOPE or Xenium on a gene-by-gene basis, indicating more frequent calls in the lower expression regime. This, coupled with the lower specificity across several tissues for CosMx and the high false discovery rate, suggest that CosMx is prone to errors in calling lowly expressed genes.” –Wang et al.

Established ground truths by benchmarking with orthogonal technologies

Why does it matter?

Using a well-established orthogonal technology, such as RNA-seq or single nuclei RNA-seq (snRNA-seq), allows researchers to ensure relative gene expression levels, overall sensitivity, and the established assay dynamic range are reflective of true biology and not simply noise.

How did the authors use it in these studies?

Cook et al. performed snRNA-seq on tissue sections adjacent to those used for the Xenium and CosMx analyses for their orthogonal validation. Wang et al. generated pseudobulk gene expression data from the tissue microarrays run on each platform, then compared this to bulk RNA-seq data from cancer (TCGA) and normal (GTEx) tissue samples that matched the cancer and tissue types used in the spatial imaging assays.

What did they find?

The Cook group found that Xenium data correlated well with snRNA-seq across the whole dynamic range, and also offered a greater dynamic range and increased sensitivity over CosMx (Figure 3, top panel). In examining the CosMx data, the authors noted, "...a noticeable inflation of lowly expressed genes, contributing to its compressed dynamic range."

Wang et al. highlighted good correlation between Xenium, MERSCOPE, and CosMx and bulk RNA-seq data (Figure 3, bottom panel). Their data also demonstrated the same potential issue with low-expression transcripts, reporting, “...the presence of a characteristic upswing for CosMx, even when comparing to orthogonal data, further shows that there is a higher false positive rate for lower expression level genes in the CosMx data.” 

Wang et al. also made a point of comparing sensitivity between the panels. Specifically, they found that, “Xenium showed 2.6-fold higher median expression with the breast panels (10 μm) than MERSCOPE, and 13.6-fold higher median expression with the lung panels (5 μm).” Xenium performed even higher compared to CosMx, with Wang et al. stating that, “Xenium consistently showed higher expression levels on the same genes than CosMx in the tumor TMA, with the Xenium breast having 14.6-fold more counts than the CosMx multi-tissue data sets. The Xenium multi-tissue panel data showed a slightly smaller difference, with 12.3-fold higher expression on the same genes, while the lung panel, which was acquired closest in time following slicing, also displayed a median of 14.0-fold higher expression.”



What was the authors’ takeaway?

“We then compared transcript abundances from the two platforms to the matched snPATHO-seq. Although these counts were averaged across the entire population, we observed that the Xenium data correlated well with the snPATHO-seq data across its entire range of detection. In contrast, the CosMx data exhibited a noticeable inflation of lowly expressed genes, contributing to its compressed dynamic range. As a result, genes varying three orders of magnitude in the snPATHO-seq data had similar detection in the CosMx data. This is compounded by a reduced sensitivity for highly expressed genes, which were recovered at lower levels than observed in the snPATHO-seq.” –Cook et al.

Compared performance using overlapping genes

Why does it matter?

The total number of genes (plexity) on a panel can bias some metrics, including the total number of unique genes detected per cell and the total number of transcripts detected (both in single cells and across the tissue), especially if the authors do not account for noise. Analyzing the same complement(s) of genes across platforms, as well as transcripts above noise, helps provide a more direct comparison of performance.

How did the authors use it in these studies?

Cook et al. explicitly noted the number of overlapping genes (125) in addition to total panel size (377 genes for Xenium and 960 genes for CosMx). When comparing metrics that may be influenced by plexity size (such as number of unique genes detected per cell and median transcripts per gene), they provided the overall data, data normalized by number of targets, and data compared across only shared genes.

In Wang et al., they compared multiple Xenium and MERSCOPE tissue panels to CosMx, noting that—of the six panels they focused on—all of them shared >94 genes, with a high degree of similarity between the Xerium and MERSCOPE panels. Similar to the Cook study, this group also reported metrics from both total and shared genes.

What did they find?

As expected, given its higher plexity, the CosMx panel identified a greater number of unique genes per cell than Xenium in the Cook study. However, when examining the median per-gene number of transcripts detected across all genes, Xenium resolved roughly twice as many transcripts per gene as CosMx. Consistent with these findings, Xenium also resolved three times as many transcripts per cell as CosMx when limited to only the 125 genes shared between panels (Figure 4, top panel).

Wang et al. showed consistent results with the Cook study. When comparing Xenium, MERSCOPE, and CosMx, the higher-plex CosMx panel detected the highest absolute number of unique genes in 14 of the 20 tested tissue types when considering all panel genes. However, Xenium not only detected a greater absolute number of genes in the 6 other sample types, but detected the greatest percentage of on-panel genes across all 20 samples, followed by either MERSCOPE or CosMx (depending on sample type) (Figure 4, bottom panel).



What was the authors’ takeaway?

“When this analysis was also restricted to shared genes, we also found that Xenium consistently had higher expression levels across each tissue type, with no clear differences between performance on either tumor or normal tissue.” –Wang et al.

Compared segmentation, cell clustering, and spatial organization

Why does it matter?

Single cell spatial imaging technologies require both reliable cell segmentation and transcript detection to ensure you’re capturing single cells and assigning correct transcripts to them. 

How did the authors use it in these studies? 

Taking advantage of its cellular heterogeneity, Cook et al. focused on the stroma and examined epithelial, fibroblast, muscle, and macrophage markers (among others) in segmented cells. They characterized spatial organization based on characterized cell types in tissue. Finally, they attempted to detect a rare cell type (CD8 T cells, defined by coexpression of CD3, CD8A, and GZMA).

Wang et al. first examined the expression of pairs of non-overlapping marker genes, rationalizing that well-segmented cells should have a low proportion of cells expressing both genes in a pair. They also looked at the cell types each platform was able to resolve in several tissue types based on canonical markers.

What did they find?

Both studies highlighted that the nuclear boundary expansion used by Xenium for cell segmentation* resulted in cells that were consistently larger than CosMx and MERSCOPE. In spite of this, the Wang group found that Xenium was able to clearly separate lineages denoted by non-overlapping marker genes (CD19 and CD3e for B and T cells, CD8 and CD4 for T cell subsets, and CD3E and EPCAM for epithelial cancer). While MERSCOPE was also able to resolve these lineages, CosMx only distinguished EPCAM vs CD3E, but not for other pairs (which the authors hypothesized may be due to false positives or cell segmentation errors, given the low counts for these immune genes).

Wang et al. found that Xenium consistently captured more cell types than CosMx (Figure 5). Xenium identified 9 distinct cell types per breast, lung, and breast cancer samples. CosMx only resolved 6, 6, and 8 cell types, respectively, and was unable to resolve all known major cell types in breast and lung. MERSCOPE identified 6 cell types in breast cancer, and researchers noted, “...a clearer one-to-one mapping between MERSCOPE and Xenium clusters than Xenium and CosMx clusters.”

Using snRNA-seq as their ground truth, Cook et al. found a greater-than-expected proportion of macrophage and fibroblast markers in muscle cells in Xenium data. Similarly, endothelial, muscle, and fibroblast markers were seen in pericytes, which they hypothesized was due to neighboring cell contamination. In CosMx data, they noted reduced specificity of marker genes and stated that it could be due to, “...a lower abundance of on-target markers and higher noise levels, rather than segmentation errors.”

Finally, Cook et al. attempted to detect CD8 T cells using both Xenium and CosMx. While Xenium identified a corresponding population, this population was absent in the CosMx data (Figure 6, top panel). Additionally, while markers of lymphocytes—small cells with low RNA content—exhibited punctate expression in tissue on Xenium, in CosMx SMI these markers were distributed throughout the tissue and insufficiently expressed for cell characterization (Figure 6, bottom panel). 



What was the authors’ takeaway?

“Both the global and tissue clustering results show that CosMx is also able to recognize the major cell types, but cannot identify cell subtypes. Additionally, since the cluster-enriched genes do not correspond to well-known markers, probably due to the low expression caused by low sensitivity and specificity, cell type annotation was particularly difficult.” –Wang et al.

*Note: When these studies were performed, the default Xenium cell segmentation was a 15 µm nuclear expansion. Since then, this distance has been reduced to 5 µm, and Xenium now has the option for morphology-driven cell segmentation with the Xenium Multimodal Cell Segmentation Kit

Highlighting the concordance of these findings

It is important to note not just the findings from these two independent studies, but to highlight just how concordant their findings were. In spite of using different tools for orthogonal benchmarking, different tissue types, and different approaches, both studies found that:

  • Xenium exhibited higher specificity than CosMx (measured either by percentage of on-target genes versus negative control probes, or by FDR).
  • Xenium correlated well with snRNA-seq and RNA-seq across its entire dynamic range, while CosMx had inflated counts in lower expression transcripts (which was explicitly highlighted in both manuscripts).
  • Xenium consistently detected more transcripts than CosMx on genes shared between both panels.
  • Xenium was able to detect cell types that CosMx did not, including lymphocytes and rare/difficult-to-detect cell types.

Additionally, both groups have made their data publicly available for re-analysis and incorporation into future comparison studies.

Touching on comparison study limitations

A well-conducted study will highlight limitations in their experimental designs. For example:

Cook et al. acknowledged a limited sample size, consisting of a single tissue type from a single patient (though the authors are following up with a much larger study). This is an important consideration given the potential impact of tissue type and complexity on assay performance. Furthermore, this study—as well as Wang et al.—focused on FFPE tissue and did not assess performance on fresh frozen samples (another commonly used preservation method for spatial assays). 

Wang et al. acknowledged unequal times elapsed between sectioning and imaging tissue. While they attempted to pair timepoints for each technology, they initially sectioned MERSCOPE tissue thinner than manufacturer’s recommendations, resulting in a potentially unequal comparison between Xenium and MERSCOPE at matched timepoints (though they ran another set of samples that followed the manufacturer’s recommended thickness). We focused much of the comparative data in this piece on the samples that showed minimal variance. They also noted that making the MERSCOPE panel compatible with all tissues necessitated the removal of several genes, potentially impacting its performance versus Xenium.

Capturing a more complete picture: Acknowledging other comparison studies

This article takes a deep dive into two recent independent benchmarking studies that used differing methods but reached similar conclusions. Two other recent studies have also been conducted.

The first is an independent study from Hartman and Satija at the New York Genome Center. Their work compared 6 different single cell spatial imaging technologies (including Xenium and MERSCOPE) in mouse brain using a Baysor-based cell segmentation strategy. The authors pointed out several limitations in their study, specifically that they focused on a single anatomical feature (cortex) in a single tissue type (fresh frozen mouse brain). This work used publicly available datasets for both single cell spatial imaging platforms and scRNA-seq analyses, not side-by-side comparisons from the same mice or neuroanatomical location.

The second is a NanoString-sponsored study from the Dulai group at Northwestern University. Their study compares the performance of Xenium and CosMx in FFPE ileal and rectal human biopsies. While the webinar did not touch on limitations or caveats in this study, several potential points to consider are:

  • No data was provided for measuring assay specificity and/or noise (a relevant consideration given the overall lower specificity and ‘upward curve’ observed in CosMx by Cook and Wang et al.).
  • While a public scRNA-seq dataset was used to annotate cells, no orthogonal technology was used to provide a ‘ground truth’ for comparing assay performance between both platforms and individual samples.
  • The authors state that CosMx provides less variability in cell-type proportions than Xenium, but do not provide data on the accuracy of called cell types for each platform.
  • No visualizations of cell clustering or cell type annotation in UMAP or spatial distribution is shown, making it challenging to determine whether called cell types cluster together or are artifacts of noise.
  • The authors state that CosMx performs consistently regardless of RNA quality; however, only 5 of 16 samples are shown, and no acknowledgement of a negative correlation with higher quality samples in CosMx data is made.
  • A claim is made that CosMx measures ~6 times more genes per cell than Xenium; this claim is not made by the webinar speaker and is derived from all genes on each panel (rather than overlapping genes) with no mention of the % of genes above noise. 

Single cell spatial imaging technologies are rapidly evolving with advances in plexity, analytical methods, cell segmentation modalities, software, and more. These studies represent what was available to the authors at the time of the study, and understanding the advances that have been made since publication is critical to ensure you choose the platform that best fits your needs.

Looking to go deeper with comparisons between single cell spatial imaging platforms? Watch David Cook and Dr. Luciano Martelotto’s webinar below!

 

Talk to a specialist