How do I index hierarchical data?

caddala · July 15, 2016, 5:16pm

I have data similar to file paths that I would like to index. (basically list of tokens separated by delimiter)

> Ex data:
> a/b/c/d/e
> a/b/c/
> a/m/n
> x/y/z

Once I index, I should be able to query to get the immediate children for a given token as shown below.

> For prefix of a, immediate children are [b, m]
> for prefix of x, immediate children are [y]
> Also tokens at root would be [a,x]

Glen_Smith · July 17, 2016, 12:00am

Have a look at the Path Hierarchy Tokenizer.

caddala · July 17, 2016, 1:56am

I actually looked into it. I was able to use the path tokenizer so that a string such a a.b.c would create [a,a.b, a.b.c] terms. But I am not sure if that would be able to address the kind of search query I was looking for.

I looked at terms aggregation which could help with this kinda data but with lots of data, terms aggregation is not able to support this kind of real time query. For example just to find out the root token, it has to do an aggregation at that level. Imagine having a millions of documents and doing aggregation at that level.

Topic		Replies	Views
Path_hierarchy aggregation for specific depth Elasticsearch	1	469	February 8, 2018
Sub aggregations on bucket key Elasticsearch	4	569	April 27, 2020
How to use path_hierarchy tokenizer Elasticsearch	2	449	July 6, 2017
How can I get immediate subdirectories using the path hierarchy tokenizer? Elasticsearch	5	477	June 24, 2021
Elasticsearch - using the path hierarchy tokenizer to access different level of categories Elasticsearch	1	468	July 6, 2017

How do I index hierarchical data?

Related topics