COMPYDA - Compare Your Biomedical Image Data

Introduction

Compyda is a free web-based tool for statistically comparing the characteristics of two image datasets. The tool was initially designed to verify the realism of generated biomedical image data, although its potential use is much broader. Augmented image data is essential in medical and biomedical image analysis algorithms. However, the usage of such data strongly relies on their quality and plausibility. Despite the emergence of many image synthesis frameworks in recent years, the quality control of the generated images is still overlooked in many cases. If we want to use augmented image data, we should be aware of two critical aspects:

This integrated interactive guide helps to check the two aforementioned requirements step by step. After uploading two datasets (the original and the augmented one), the user is provided with the results of univariate analysis. In the case of static data analysis, it is also possible to browse the results of multivariate analysis afterwards. Finally, the user can download all the results, including plots, statistics and derived computations.

According to the type of uploaded image data, Compyda provides you with the computation of characteristic descriptors:

Other descriptors can also be processed in the analysis, but precomputation from the user’s side is necessary.

Methods

The first part of the analysis focuses on inspecting each descriptor separately -- univariately. The analysis compares the distribution of a selected descriptor across the dataset. In the case of static data analysis, one can have a complex view of distribution comparison through interactive plots (including quantile-quantile plots, histograms, boxplots), descriptive analysis and Kolmogorov-Smirnov Test. The analysis of time-lapse sequences is also based on the distribution comparison. The plot shows the basic descriptive statistics (such as min, max, mean, median value and interquartile range (IQR)) for each timepoint/frame of the sequence for each descriptor. The corresponding statistics are connected in time. In case of notable similarity, the IQR ranges overlap in most of their area, and the mean/median curves will have a similar shape and position.

Univariate analysis can reveal dissimilarities in particular descriptors; however, it cannot capture the overall multivariate distribution of data points and identify possible outliers in a multidimensional space.

In our tool, two methods of multivariate comparison are implemented: t-distributed stochastic neighbor embedding (t-SNE) and principal component analysis (PCA). The output of both methods is easily interpretable in 2D or 3D plots. Currently, multivariate analysis is available for static data only.

Features

Acknowledgement

We acknowledge the support of the Ministry of Education, Youth and Sports of the Czech Republic (MEYS CR) (Czech-BioImaging Projects LM2023050 and CZ.02.1.01/0.0/0.0/18_046/0016045).

Credits

Contact

If you encounter any failure or if you would like to contact developers, please write an email to cbia-compyda@fi.muni.cz.