Structuring data for hierarchy

Dani · March 24, 2016, 2:56pm

Hi there,

I'm trying to structure a log file that holds information about hierarchy that I would like to aggregate on at different levels. I may have data that looks like this animal::mammal::dog::collie or fruit::vegetable::carrot where my data is separated (using two colons, but that's a separate issue) and can have multiple levels.

Currently I'm splitting using mutate and split on ::, which converts my string to an array.

The problem is that when I'm in kibana, I see that the data in the array is handled as a "grab bag" of terms, and I'm not able to get the hierarchy out of it, to allow graphing based on any term where all previous terms match. For example, if I want to aggregate on third term, I would need the first and the second to be the same, so a dog aggregation would require that dog is the 3rd term and animal and mammal are 1st and 2nd respectively.

While I am using using kibana to get an idea of how the data looks while I'm putting it together, my goal is to get it to work as an aggregation directly from elasticsearch. But I'd like to know the ideal way of structuring this data before i move on to the next step.

Is there a better way to store this kind of data than in an array, should I hold each value in a separate field (item1,item2,item3, etc ) instead of an array(items) of values? Would this make it faster to aggregate?

ywelsch · March 24, 2016, 5:53pm

It depends on the kind of queries you want to do. Using an array of items loses the order of the items, however ( Complex Core Field Types | Elasticsearch: The Definitive Guide [2.x] | Elastic ). This means that queries are unable to distinguish the different levels in the hierarchy if you encode them by just splitting levels on "::". If you want to go with the array approach and keep hierarchy information you can use something like the path hierarchy tokenizer
( Path hierarchy tokenizer | Elasticsearch Guide [8.11] | Elastic ). It comes with limitations however if you want to do aggregations (see Elasticsearch - using the path hierarchy tokenizer to access different level of categories - Stack Overflow ). The most flexible solution for querying is probably if you use separate fields. This only makes sense though if your hierarchy does not have too many levels. Performance depends very much on the kind of aggregations done (Here, they are probably combined with some filters as well).

Topic		Replies	Views
Aggregation and sub-aggregation using an array of strings Kibana	2	1173	July 6, 2017
Sub aggregations on bucket key Elasticsearch	4	563	April 27, 2020
What is the best approach on hierarchical/nested data? Elasticsearch	2	604	July 5, 2017
Datastructure relational data Elasticsearch	8	459	June 27, 2020
How do I index hierarchical data? Elasticsearch	3	5086	July 5, 2017

Structuring data for hierarchy

Related topics