Kibana 5 : Group by a field value and display average for each group


(Manjunatha Nakshathri) #1

Hello there,

I am relatively new to es and kibana. We are using latest kibana 5 for our dashboards. Below is the sample format of our document :

{
"country": "IN",
"places": [
{
"name": "restaurant-a",
"cost" : 100
},
{
"name": "restaurant-b",
"cost" : 50
}
]
}

Second doc sample
{
"country": "US",
"places": [
{
"name": "restaurant-a",
"cost" : 100
},
{
"name": "restaurant-C",
"cost" : 20
}
]
}

Requirement:
We need to show the average cost for each of the restaurants across all countries. We tried to do this with the simple avg metric but looks like its taking avg based on the total number of documents. This will not be correct as all restaurants will not be present in each of the document.

Any approach to resolve this will be of great help.

Regards
Manjunath


(Jim Unger) #2

In order to aggregate on the data you have in your places field, you are going to need to flatten that data and reindex it. When you index the document in the structure you have shown, all of the indexed tokens/values become details of the overall document. So, document 1 has a places.name property, with values 'restaurant-a', 'restaurant-b', and a places.cost property with values 100 and 50. The link between 'restaurant-a' and 100 is lost.


(Manjunatha Nakshathri) #3

Thank you for the clarification. Can you please suggest how I can flatten and reindex it?


(Jim Unger) #4

You could structure your index so that the documents looked something like this instead:

{
  "country": "IN",
  "name": "restaurant-a",
  "cost" : 100
},
{
  "country": "IN",
  "name": "restaurant-b",
  "cost" : 50
}

You get duplication of the country field in this case, but that shouldn't be a problem. As for how you accomplish this, it depends on what method you're using the index this data. Ideally, you should transform the data at index time to be in a flat format.


(Manjunatha Nakshathri) #5

Changing the structure during indexing time itself will be bit tough as it will change some of the business flows.

Is it possible to maintain the nested format for application business flows but create a parallel index which is of flat structure ? For the new documents this might probably doable. But with the older data, is it possible to flatten it by any means ?

I tried with reindexing all the documents by selecting only name and cost fields. But this will recreate the original index without the country field but the older nested objects remains. I know this is expected behavior but trying to see is there any way with which I can migrate existing records.

Please suggest.


(Jim Unger) #6

I stress again, that you may be losing the perceived relationship in your data, so make sure that your 'application business flows' are making correct assumptions.

I don't think there is an easy way to split your documents into multiple documents using Elasticsearch. You might be able to accomplish something using the Reindex API along with an ingest pipeline but that seems like a lot of work for something that shouldn't really be used for more that just a one time fix. They are too inefficient to be used as part of your normal data ingest processes in production.

You should modify your data ingestion processes to index your data into a second Elasticsearch index in a flattened format if you need to maintain the shape of the documents for other purposes.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.