Tips & Tricks
Skipping low-transcript regions
The best way to ignore background regions is to select the region of interest because it can reduce the analysis area and filter unneeded transcripts.
However, you can also ignore patches of low-transcript density during cell-type assignment.
The min_transcript argument allows you to skip the analysis of patches of the sample with low transcript density.
The actual number to use may depend on the technology/panel-size, the detection efficiency in your experiment, and the chunk size.
After splitting the sample into overlapping chunks for parallel processing, all chunks with fewer than the defined number of transcripts will be skipped.
See sainsc.LazyKDE.assign_celltype for details.
my_analysis.assign_celltype(signatures, min_transcripts=10)
Technology-specific recommendations
There are a number of considerations for the parameter choices and the workflow of sainsc that depend on the data and specifically, the technology used to generate it.
Gene/Feature selection
For imaging-based spatially resolved transcriptomics datasets with low-plex panels, you can typically keep all genes during cell-type assignment. For the sequencing-based technologies, you will typically rely on feature selection to find a subset of useful genes for cell typing. But this is also true for new, high-plex panels on imaging-based platforms such as Xenium Prime, and especially for Atera and other platforms that provide full transcriptome coverage. The genes can be selected based on prior knowledge, such as known cell-type markers, or differentially expressed genes between the cell types that should be assigned. Alternatively, highly variable or spatially variable genes can be detected from the data that is analysed through, e.g., existing cell segmentation or binning approaches.
Bandwidth for kernel density estimation
The bandwidth of the kernel used in the kernel density estimation approach by sainsc can be influenced by factors such as diffusion. This, in turn, depends on the technology and is generally larger for sequencing-based than imaging-based technologies. Generally, we have observed values between 2-4 µm to perform well. Smaller bandwidths may not sufficiently integrate the gene expression, while a large bandwidth may oversmooth it.
# TODO add plots of over and undersmoothed data
How many threads to use.
Sainsc makes heavy use of multi-threading to improve performance and speed up the analysis.
By default, it will use as many threads as CPUs are available (but fewer than 32).
However, in some cases that may not be the best choice; if you are running Sainsc on your laptop/computer, it may be better to reduce the number of threads. The same is true if you have other processes running at the same time that also need CPU resources.