Medical imaging (CT, MRI, echo, etc) data is often stored as DICOM files, which are hosted by a PACS server. The system is horribly old and it's design is exotic. Each patient, participant or sample has a hierarchical structure of studies which contain series which contain images. Each image then is a separate file with a key-value store for meta-information (patient name, location, scan modularity, scanner protocol, etc). One exotic aspect is that the key-value data is duplicated along the separate image files. A 3D medical image is represented by several DICOM files, each containing a 2D images.
Then there is also a bunch of other data attached like:
- Annotations of regions of sick tissue, tumors, etc; either by a medical expert of algorithm
- Tables, often with numeric values
- Graphs, sometimes they illustrate where the patient's measured values lie within the general population
I've seen these annotations to be stored as several DICOM files, where the marked region has a specific, semi-transparent color and the image itself contains a legend. Tables and graphs are often also stored as rendered images.
This is adequate for clinical work, but in the era of data science I would think that better storage solutions exist, in particular ElasticSearch.
Hence I was wondering whether an extension on of ES with multi-dimensional (numerical) arrays is reasonable. That is, a data type that stores nD numerical data efficiently. This way, such medical meta data and the aforementioned hierarchy can effectively be stored and searched, while also attaching medical imaging scans, reports and all kinds of numerical data.
would a 'numpy.ndarray' data type in ES be feasible and reasonable?