Structuring data for hierarchy


#1

Hi there,

I'm trying to structure a log file that holds information about hierarchy that I would like to aggregate on at different levels. I may have data that looks like this animal::mammal::dog::collie or fruit::vegetable::carrot where my data is separated (using two colons, but that's a separate issue) and can have multiple levels.

Currently I'm splitting using mutate and split on ::, which converts my string to an array.

The problem is that when I'm in kibana, I see that the data in the array is handled as a "grab bag" of terms, and I'm not able to get the hierarchy out of it, to allow graphing based on any term where all previous terms match. For example, if I want to aggregate on third term, I would need the first and the second to be the same, so a dog aggregation would require that dog is the 3rd term and animal and mammal are 1st and 2nd respectively.

While I am using using kibana to get an idea of how the data looks while I'm putting it together, my goal is to get it to work as an aggregation directly from elasticsearch. But I'd like to know the ideal way of structuring this data before i move on to the next step.

Is there a better way to store this kind of data than in an array, should I hold each value in a separate field (item1,item2,item3, etc ) instead of an array(items) of values? Would this make it faster to aggregate?


(Yannick Welsch) #2

It depends on the kind of queries you want to do. Using an array of items loses the order of the items, however ( https://www.elastic.co/guide/en/elasticsearch/guide/current/complex-core-fields.html ). This means that queries are unable to distinguish the different levels in the hierarchy if you encode them by just splitting levels on "::". If you want to go with the array approach and keep hierarchy information you can use something like the path hierarchy tokenizer
( https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pathhierarchy-tokenizer.html ). It comes with limitations however if you want to do aggregations (see http://stackoverflow.com/questions/24819234/elasticsearch-using-the-path-hierarchy-tokenizer-to-access-different-level-of ). The most flexible solution for querying is probably if you use separate fields. This only makes sense though if your hierarchy does not have too many levels. Performance depends very much on the kind of aggregations done (Here, they are probably combined with some filters as well).


(system) #3