Backwards aggregation

dreamspy · May 8, 2020, 2:54pm

I'm not sure what to call this problem, so backwards aggregation will have to suffice, even though it's confusing

My problem is that I'm not able to aggregate my data as I would like to due because it's not structured like I would prefer.

My data contains results from chemical analysis showing stats about each element, like quantity (quantification), mean and standard deviation.

This is the structure of the data:

    {"Quantifications": {
        <Element ID> : {
            "Quantity" : float,
            "Mean" : float,
            "StandardDeviation" : float
        }
    }

So for example if I have a document with results from 3 elements, say aluminium (Al), iron (Fe) and copper (Cu), and possibly some more elements, it would look like this:

    {"Quantifications: {
        "Al":{
        	"Quantity" : 0.012,
            "Mean" : 0.011,
            "StandardDeviation" : 0.003
        },
        "Fe":{
        	"Quantity" : 0.016,
            "Mean" : 0.014,
            "StandardDeviation" : 0.005
        },
        "Cu":{
        	"Quantity" : 0.002,
            "Mean" : 0.001,
            "StandardDeviation" : 0.001
        },
        ...
        ...
        ...
    }

Now what I would like to do is to show the quantity of all the elements on a single graph, with quantity on Y-axis and timestamp on X-axis, aggregated on element, showing individual lines for Al, Fe and Cu. But since <Element ID> comes before the "Quantity" field, I'm not able to simply aggregate based on <Element ID>. However if the data would look like here below, aggregating "Quantity" based on <Element ID> would be simple:

    {"Quantifications: {
        "Quantity" :{
        	"Al" : 0.012,
        	"Fe" : 0.016,
        	"Cu" : 0.002,
        	...
        },
        "Mean" : {
        	"Al" : 0.011,
        	"Fe" : 0.014,
        	"Cu" : 0.001,
        	...
        },
        "StandardDeviation" : {
        	"Al" : 0.003,
        	"Fe" : 0.005,
        	"Cu" : 0.001,
        	...
        }
    }

But before I go ahead and change the structure of the whole dataset, which is a bit clumbersome because of other dependencies in my code, I would like to ask if there is some proper way to deal with this in Elasticsearch without changing the structure of the data?

Marius_Dragomir · May 11, 2020, 4:59pm

If you only want to use the data in Elasticsearch with queries, it's doable. But if you want to use it in Kibana, you need to flatten your data, for example, in your example with Aluminum, Iron and Copper, each element should be a separate document.

       { "Element" : "Al",
       	"Quantity" : 0.012,
        "Mean" : 0.011,
        "StandardDeviation" : 0.003
        }

and so on

dreamspy · May 11, 2020, 5:31pm

That makes total sense.

So I have two ways of doing this:

Split the data before it goes into Elasticsearch
Have Elasticsearch split the data for me

I know how to do the first option, but it's not the cleanest.

So I'm wondering, is it possible to go for the second option and have Elasticsearch somehow split one document automatically into many documents and possibly throw it into another index so I can work more efficiently with the data in Kibana?

Or are there possibly some other methods that work better?

Marius_Dragomir · May 11, 2020, 5:32pm

You can split them in Elasticsearch at ingest time. https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html There are multiple processors so it's a long read. Don't really have much experience with them but I know the ES team uses them with great success. if you don't manage to figure them out from the docs, you can ask in the Elasticsearch part of this forum for some hints.

system · June 8, 2020, 5:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.