SlideIO: A New Python Library for Reading Medical Images

Introduction

Medical images - images generated by microscopes, scanners, and other devices are not the same as regular pictures. Their size is one of the key distinctions. These pictures might be rather big. These days, gigabyte-sized presentations are not that uncommon. The quantity of dimensions is another distinction. Several bio-image formats support three and four dimensions (volumes and time series). Some formats provide scanner-specific characteristics, such as phase index, rotation (for data collected from different angles), and focus distance, in addition to the standard dimensions.

A multi-gigapixel image cannot be encoded using standard compression techniques. When using image codecs like JPEG or PNG, the entire picture must be saved to the computer's memory in order to be viewed on a screen, or even only a tiny portion of it can be read. Bio-formats use zoom pyramids and the tiling technique to overcome the challenges. It uses the least amount of memory and processing power possible to read any area of a picture at any scale. A collection of picture duplicates of various sizes is called a zoom pyramid.

SlideIO: A New Python Library for Reading Medical Images

The purpose of the Slideio library is to interpret medical pictures as efficiently as possible by utilising their intrinsic structure. Slideio is not the first library to offer this kind of feature. I made extensive use of a variety of libraries during my image analysis practice. However, I haven't come across a library that can meet all of my needs for picture analysis thus far. I made the decision to make my own, which should compile all of my knowledge in this field.

There is a driver architecture in the library. Each driver supports one or more picture formats. Four drivers are available with the first slideio version:

CZI: driver for the reading of Zeiss CZI images.
SVS: driver for the reading of Aperio SVS images.
AFI: driver for the reading of Aperion fluorescent images.
GDAL: driver for the reading of generic formats like jpeg, png, tiff, etc. It uses a popular c++ image library GDAL.

Slideio library's object structure is straightforward:

Image drivers create slide objects. A single picture file (or folder, depending on the image type) is represented by a slide object. At least one Scene object, which is a continuous raster area (two-dimensional picture, volume, time series, etc.), is contained in a Slide object. A single scene, such as a single tissue scan, is supported by some picture formats. Multiple tissue areas can be stored in a file using certain formats. A 2D scene's pixels are all the same size and resolution. Every slice in a 3D volume, if a scene is one, has the same dimensions and resolution. For time series, the same holds.

The snippet of code below demonstrates how to use the "SVS" image driver to open a slide:

 
slide = slideio.open_slide(image_path,'SVS')
num_scenes = slide.num_scenes
scene = slide.get_scene(0)
print(num_scenes, scene.name, scene.rect, scene.num_channels)   

Image metadata

The Slideio library offers many degrees of picture information. The "raw_metadata" field of the slide object shows unaltered text that was taken from the picture. The text's content is unique to the file type. Regarding the Aperio SVS slide, it is a text string that has been taken out of the "Image Information" tiff file. The Zeiss CZI file, in this instance, is an XML document that contains all of the file metadata. The following code excerpt may be used to extract metadata from an Aperio SVS file:

 
slide = slideio.open_slide(image_path,"SVS")
raw_string = slide.raw_metadata
raw_string.split("|")   

This is what the code example produced as an output:

 
['Aperio Image Library vFS90 01\r\n20320x19545 [0,100 19919x19445] (240x240) JPEG/RGB Q=70',
 'AppMag = 20',
 'StripeWidth = 2032',
 'ScanScope ID = SS1598',
 'Filename = 24496',
 'Date = 11/09/11',
 'Time = 18:51:40',
 'Time Zone = GMT+09:00',
 'User = e8ddb309-efc1-4a6b-b9b0-7c555f9fa0ef',
 'MPP = 0.4962',
 'Left = 23.939867',
 'Top = 19.531540',
 'LineCameraSkew = 0.000320',
 'LineAreaXOffset = 0.060417',
 'LineAreaYOffset = 0.011084',
 'Focus Offset = -0.000500',
 'DSR ID = ap6101-dsr',
 'ImageID = 24496',
 'Exposure Time = 109',
 'Exposure Scale = 0.000001',
 'DisplayColor = 0',
 'OriginalWidth = 20320',
 'OriginalHeight = 19545',
 'ICC Profile = ScanScope v1']   

Raster access

The primary object for raster data access is a scene. It makes the following details visible:

compression: type of data compression;
magnification: scanner magnification;
name: scene name;
num_t_frames: number of time frames in the time series;
num_z_slices: number of slices in the volume;
rect: coordinates and dimensions of the scene rectangle;
resolution: the resolution of the scene in-plane (a tuple);
t_resolution, z_resolution: resolutions of the scene in time and z-direction;
num_channels: number of channels in the scene;
channel_data_type: data type of an image channel (byte, 16 bit, etc.);
channel_name: name of an image channel.

The scene name, rectangle, and resolution are retrieved by the code snippet that follows.

 
scene  = slide.get_scene(0)
scene. name, scene. rect, scene.num_channels, scene.resolution   

It generates the subsequent result:

 
('Image', (0, 0, 19919, 19445), 3, (4.961999999999999e-07, 4.961999999999999e-07))   

The picture is 19445 pixels high and 19919 pixels wide. It is 0.4962 mkm in both x and y directions for each pixel. There are three channels in the picture. The picture format determines the meaning of a channel in bio images. The colours for the bright field photos are limited to red, green, and blue. These photos have three 8-bit channels. The functions get_chanel_data_type and get_channel_name provide access to channel attributes.

 
for channel in range(scene.num_channels):
    print(scene.get_channel_data_type(channel))   

Output:

 
uint8
uint8
uint8

The read_block method gets the continuous region's pixel values. When the method is run without any arguments, the entire scene is returned to its original size. Usually, the big size makes it impossible to read the entire image at the original scale. In this scenario, the application has three options: it may either get a portion of the image or downscale it to a reasonable size. Below is a sample of code. pulls up the entire image and enlarges it to a 500-pixel wide image. It should be noted that when the picture height is zero, it means that an automated calculation is required to maintain the same scale in both the x and y axes.

 
image = scene.read_block(size=(500,0))
plt.imshow(image)   

The following code snippet reads a rectangular section from the picture and reduces its width to 500 pixels.

 
image = scene.read_block((5000, 5000, 5000, 5000), size=(500,0))
plt.imshow(image)   

Reading a single channel or a selection of channels is feasible:

 
image = scene.read_block((5000, 5000, 5000, 5000), size=(500,0), channel_indices=[0])
plt.imshow(image, cmap='gray')   

Reading of volumes and time series is possible with additional tuple-parameters slices and frames:

Code:

 
slide = slideio.open_slide(image_path, 'CZI')
scene = slide.get_scene(0)
image = scene.read_block(slices=(0,scene.num_z_slices))
image. shape   

Output:

 
(1000, 1000, 27)

Code:

 
slide = slideio.open_slide(image_path, 'CZI')
scene = slide.get_scene(0)
image = scene.read_block(frames=(0,5)) # read 5 time frames begining with 0 frame

slide = slideio.open_slide(image_path, 'CZI')
scene = slide.get_scene(0)
image = scene.read_block(slices=(0,10), frames=(0,5)) # read a multidimensional block of 10 z slices and 5 time frames   

Installation

The slideio library can be installed with pip:

Only Windows and Linux builds are supported at this time.

Conclusion

A Python tool called Slideio interprets medical pictures. It enables reading of both entire slides and specific areas of slides. Large slides can be efficiently reduced in size. To expedite the scaling process, the module uses internal zoom pyramids of photos. Slideio can handle time series and 3D data sets in addition to 2D presentations.

The library works with several well-known image analysis packages, including opencv, and provides rasters as numpy arrays.

As of right now, it can read general formats, Zeiss CZI files, and Aperio SVS and AFI files. Drivers for the following formats will soon be available: