Preprocessing of single cell multiplexed image data

Reading Data

SpatialTis prepared a preprocessing module to help you convert your data into AnnData. Currently, SpatialTis can read single cell information from multichannel image file with mask image prepared. For example, technology like Imaging mass cytometry (IMC) and Multiplexed ion beam imaging (MIBI). For other spatial single-cell technologies, it’s easy to transform your expression matrix into AnnData.

  • Input data must contain in one entry folder

  • Each ROI must store in separated sub-folders with its mask image

  • The structure must organized as your experiments.

Let’s say your file structure look like this:

Data
├── Patient1
│   ├── Sample1
│   │   ├── ROI1
│   │   ├── ROI2
│   ├──Sample2
│   │   ├── ROI1
│   │   ├── ROI2
│   └──Sample3
│       ├── ROI1
│       └── ROI2
├── Patient2
├── Patient3
└── ...

In each one of the ROI folders contains two data files:

  • One mask image file

  • One stacked channels image file

There should always be a mask image that tells SpatialTis how each cell looks like, SpatialTis CAN’T do segmentation.

If you image channels are stored in separated images, like below:

ROI1
├── Patient1_Body_ROI1_Dy161_CD20.tiff
├── Patient1_Body_ROI1_Dy162_CD8.tiff
├── Patient1_Body_ROI1_Dy164_CD99.tiff
├── Patient1_Body_ROI1_Er166_NFkB.tiff
├── ...
└── Patient1_Body_ROI1_mask.tiff

Please stacked them into single image file, your folder should eventually look like this:

ROI1
├── stacked.tiff
├── Patient1_Body_ROI1_mask.tiff
└── ...

First we need to specific the path of the entry folder of our data:

entry = '/Data'

And then, we need to describe how the experiment is designed. The names should be corresponded to each level of the folder, if you look back to the file tree that we show before, it’s easy to understand:

obs_name = ['Patient', 'Sample', 'ROI']

Another information is the markers data, it needs to be stored into a dataframe. This allow you to add as many columns as you want, for example you can add an extra “channels” columns. But remember the order of your markers must align with the layers’ order in your image file. Either following the order of channels or pages, depends on the structure of your .tiff:

channels = ['Dy161', 'Dy162', 'Dy164', 'Er166', ...]
markers = ['CD20', 'CD8', 'CD99', 'NFkB', ...]
var = pd.DataFrame({"channels":channels, "markers":markers})

Now, let’s read it out:

data = read_ROIs(entry, obs_name, var,
                 mask_pattern="mask", img_pattern="stacked")

You must noticed that there are another two arguments, mask_pattern allows you to tell SpatialTis which file is the mask image, in our example, file name contains “mask” will be used as mask image. The same for img_pattern, so if you have other files in the same directory, this will help SpatialTis identify which is mask image and which is data image.

Finally, we can start processing your images into anndata, the speed is related to the size of your image, in my own test an 1000*1000 ROI from IMC data takes ~15s:

data = data.to_anndata()

If you have a large dataset, you can set mp=True to enable parallel processing:

data = data.to_anndata(mp=True)

The default methods to determine the cell shape is “convex hull”, another option is “concave hull” (Determine cell shape). Although “concave hull” is going to give you more accurate shape, i strongly recommend using “convex hull”.

After the processing, make sure to save your data on the disk:

data.write(filename="sample.h5ad")

Let’s see what’s in the data:

print(data)
"""
AnnData object with n_obs × n_vars = 152037 × 36
    obs: 'Patient', 'Sample', 'ROI', 'area', 'cell_shape', 'centroid', 'eccentricity'
    var: 'channels', 'markers'
"""

This means there are 152037 cells with 36 markers. In the obs field, ‘Patients, ‘Sample’, ‘ROI’ are the names for different experiment condition, ‘area’, ‘cell_shape’, ‘centroid’, ‘eccentricity’ is calculated by SpatialTis.