Several factors should be considered when choosing a spatial molecular imaging platform, especially when it comes to drilling down into the molecular underpinnings of disease. High sensitivity ensures you are maximizing the detection of individual transcripts present in a sample. High specificity gives you confidence that you measured your intended transcript with limited off-target effects. High throughput and speed are critical because they give you the answers you need when you need them. Finally, being able to seamlessly access, visualize, and analyze your data without constraints, and at no cost, allows you to focus on science and make your next breakthrough.
But how can you measure these specifications among the various platforms? We recognize that this can be a confusing question, since high-plex and high-throughput in situ technologies are new and there is no clear standard definition for some of these specifications. To help you, we have tried to clearly define how we measure these specifications and explain why we think these are the right ways to measure performance metrics for in situ technology. While we know that others may use different measures and arrive at different conclusions, we believe it is important to clearly present our methodology so that you can decide for yourself.
Below we analyze data from two in situ technologies: the 10x Genomics Xenium In Situ platform and the NanoString CosMx Spatial Molecular Imager. For CosMx, we used currently available publicly released data from Nanostring’s website*: normal liver, liver cancer, and non-small cell lung cancer (NSCLC). For Xenium, we used internally generated data from the same sample types - normal liver, liver cancer, and NSCLC - obtained from a commercial vendor. We note that comparisons across in situ technologies on the same samples have not yet been performed; we will provide you with updated information regarding sample matched comparisons as they become available. And, if you’re performing experiments on multiple platforms, we’d love to hear from you via the contact form below.
*:Data is available here.
How we measure specificity: An in situ technology is specific when the vast majority of signals that it calls are true biological transcripts rather than noise. We measure specificity using the false discovery rate (FDR), a commonly used statistic in genomics, to characterize the expected fraction of calls made that are not true (1). Specifically, FDR is defined as the ratio of false positive calls to the total number of positive calls: FDR = (false positives) / (false positives + true positives) (see: https://en.wikipedia.org/wiki/Sensitivity_and_specificity and https://en.wikipedia.org/wiki/False_discovery_rate)
A lower FDR indicates higher confidence in the data being called. To calculate the expected FDR for in situ platforms, we use the provided negative control probe counts for each platform. We assume that negative control probes have been chosen to be a good proxy for genes, and that the average number of counts per negative control probe set represents the expected noise per gene. Using this information, we estimate false positives as the expected amount of noise per gene (estimated by the total negative control probe count divided by the number of negative control probes) multiplied by the number of genes in the panel. FDR is therefore calculated as this estimate of False Positives, divided by the Total Positives (given by the total transcript calls) (Figure 1A).
We believe that this method using negative controls from each technology serves as a good measure for in situ technology datasets and can be evaluated across sample types, tissue types, and panels. It is important to note that this method can only be performed when negative controls are provided for a technology, and when they have been well-designed to represent the true sources of noise in the system.
Results: Using the above methodology for the Xenium datasets, we calculated FDRs for Xenium as being 1.41% and 0.61% for normal liver and liver cancer, and 0.59% and 0.71% for the NSCLC samples (Figure 1B).
Using the above methodology for the Nanostring datasets, we calculated the FDRs for CosMx as being 7.2% and 5.64% for their normal liver and liver cancer datasets, and 16.28% and 25% for their NSCLC datasets (Figure 1C).*
Understanding the FDR can be important in determining the trustworthiness of your results. For example, if 300 transcripts are detected per cell (which is well within the usual range of detected transcripts), a 0.59% FDR means a researcher would expect an average of 2 misleading transcripts per cell, whereas a 25% FDR means the researcher would expect an average of 75 misleading transcripts per cell (Figure 1D). Furthermore, having a high FDR can be particularly problematic for low-expression genes, where a large fraction of the counts may be produced through noise rather than true biological signal. This effect is highlighted in the section on Sensitivity below.
*: Per Nanostring’s website, the CosMx NSCLC datasets appear to be generated on a prototype CosMx system.
Figure 1. Analyzing False Discovery Rates. The method used to calculate FDR (panel A). Xenium FDR for tissue data (panel B). CosMx FDR for NanoString data (panel C). Example of how FDR relates to the number of misleading transcripts (panel D) in a hypothetical cell with 300 transcripts.
How we measure sensitivity. Sensitivity is the percentage of starting molecules that are ultimately detected by a technology. An in situ technology is sensitive when it detects a high percentage of the true transcripts that are present in a sample.
Determining sensitivity requires knowing the true transcripts in a sample. Our method uses single cell RNA-seq (scRNA-seq) data to provide this independent knowledge of the true transcripts. Since both scRNA-seq and single cell in situ measure the same value—transcripts per cell—we believe that scRNA-seq provides a strong orthogonal benchmark to in situ analyses. High specificity and sensitivity means a strong concordance with scRNA-seq (Figure 2A), while poor specificity and sensitivity can cause inflated counts of low-expressing genes, masking of high-expressing genes, and more (Figure 2B).
To measure sensitivity for in situ data using scRNA-seq, we compare counts per cell for all genes in the relevant in situ panels to the values obtained in an scRNA-seq experiment. For each gene, we divide the in situ counts by the counts present in a reference scRNA-seq dataset to obtain a per-gene ratio of sensitivity to scRNA-seq. We then take the median of these relative expression values (either with or without correcting for noise, i.e. expected false positives) (Figure 2C).
When there is not a matched single cell dataset, assessing sensitivity can be challenging. For cancer samples, a matched dataset from normal tissue is likely not a good substitute. However, for healthy (‘normal’) samples, it is reasonable to compare in situ data to a third-party normal single cell reference. We therefore restricted our analysis to the normal liver datasets. We note that the analysis may differ for other sample types and other single cell reference data.
For each of the normal human liver datasets, we calculated the gene expression ratio of in situ versus single cell gene expression following the procedure described above (Figure 2C). We used a common publicly available liver dataset from the Human Protein Atlas (2) as the reference for each. We analyzed data for both raw counts and noise-corrected counts for what we defined as low, moderate, and highly expressed genes (0.01-0.1, 0.1-1, and >1 mean counts per cell, respectively). Noise-correction was done using the estimate of background noise given by the mean negative control probe counts.
Figure 2. Measuring in situ sensitivity using single cell RNA-seq. Hypothetical example showing our methodology for comparing in situ with single cell RNA-seq for cases with good and poor sensitivity (panels A and B), with labels describing the cause of deviation from the correlation. Calculation of median sensitivity of in situ relative to single cell analysis was performed by generating per-cell transcript counts for both single cell and in situ, calculating the per-gene ratio of in situ to single cell, then identifying the median sensitivity ratio (panel C). Low- (0.01-0.1 mean counts per cell), moderate- (0.1-1 mean counts per cell), and high-expression genes (> 1 mean counts per cell) were compared with both raw and noise (expected false positive signal)-corrected counts. Noise was determined by the estimated mean number of unspecific transcripts per cell and gene, e.g., false-positive rate or negative probe counts per gene and cell. Analysis only included transcripts that overlapped between single cell analysis and in situ panels. Xenium had a noise of 0.0073 transcripts per gene per cell (panel D, relative median sensitivities shown above bars). CosMx had a noise of 0.040 transcripts per gene per cell, (panel E, relative median sensitivities shown above bars).
Results: Using the above methodology for the Xenium datasets, we calculated the median ratio of gene expression for Xenium compared to the scRNA-seq dataset as 170%, 105% and 122% for low, moderate, and highly expressed genes, respectively (Figure 2D). After correcting for expected false positive counts, these numbers adjusted to 139%, 103% and 122%. This noise correction reduced the median relative expression by 19%, 1.5% and 0.1% for low, moderate, and highly expressed genes, respectively.
Using the above methodology for the Nanostring datasets, we calculated the median ratio of gene expression for CosMx compared to the scRNA-seq dataset as 220%, 55% and 25% for low, moderate, and highly expressed genes, respectively (Figure 2E). After correcting for expected false positive counts, these numbers adjusted to 99%, 38% and 23%. This noise correction reduced the median relative expression by 55%, 31%, and 9% for low, moderate, and highly expressed genes, respectively.
How we measure throughput: We measure throughput as the amount of tissue area that can be analyzed in a given amount of time with the instrument running at maximum capacity.
For Xenium, we calculated throughput based on an instrument runtime of 52 hours to analyze 2 slides with an imageable area of 2 cm2 per slide with a panel of 483 genes. This is the estimated run time of the mid-year Xenium software release.
For CosMx, we calculated throughput based on specifications stated in the CosMx SMI Instrument User Manual (page 48). It states for a 4-slide, 1000-plex RNA run, customers should limit FOVs (0.5 mm x 0.5 mm) to 380 per flow cell (1,520 total FOVs) to keep the instrument run time within the 14 day recommendation.
Results: The time required to analyze 12 1 cm2 sections requires 6.5 days on the Xenium platform. Analyzing 12 1 cm2 sections requires 42 days with CosMx (Figure 3).
Figure 3. Throughput of Xenium and CosMx. Throughput capabilities of Xenium and CosMx for 1 cm2 tissue sections.
Extending that analysis to annual throughput for experimental planning: Xenium can analyze 672 1 cm2 sections using 1 year, while CosMx can analyze 104 1 cm2 sections over the same time frame.
Finally, the data that you get from Xenium is both immediately usable after your run and can be further analyzed locally in a variety of open-source file formats. CosMx requires post-run upload and analysis of large files to the cloud-based AtoMx platform (Table 1). See for yourself how easy it can be to analyze and explore your data on our interactive datasets demo page.
Table 1. Xenium In Situ and NanoString CosMx/AtoMx analysis platforms. Information on Cosmx/AtoMx data analysis is sourced from the following CosMx/AtoMx user guides: MAN-10162-02 CosMx SMI Data Analysis User Manual and MAN-10170-02 AtoMx Spatial Informatics Platform User Manual. Data demonstrating FOV issues can be found here.
Our analysis highlights the strengths for Xenium in sensitivity, specificity, throughput, and data analysis, but it still has so much more to offer, including:
Figure 4. Xenium In Situ analysis and H&E staining of healthy and adenocarcinoma colon. FFPE tissue specimens were analyzed with Xenium and subjected to H&E staining for transcript and histology information on the same section. Both tissue types showed expected sensitivity, specificity, and performance, with default on-instrument Xenium clustering able to identify major cell types.
Innovation moves fast—so don’t fall behind. Take this chance to future-proof your technology with high sensitivity, specificity, and throughput. Reach out to one of our specialists today to learn more and get a demo of Xenium data.