Indexing hierarchical data

We have situation where we have hierarchical data like regions and countries with the regions. For instance APAC contains Asia which contains South East Asia which contains India, Sri Lanka, Bangladesh etc. User has a choice to select either APAC or Asia or South East Asia or all or few countries in South East Asia. Now say we have a document for person entity with location country India, we have two ways to store data -

  1. We can store all the possible regions India falls under as array of regions - APAC, Asia, South East Asia to allow it getting searched directly by user provided region?
  2. We only store immediate region South East Asia and then when user selects APAC we translate it into all possible child regions - South East Asia, North East Asia, Central Asia etc to allow search.

Which approach sounds better?

It's really up to you.


The problem is that in approach 2, we end up mentioning almost all child regions as a terms query. Is there a practical limitation to the no of terms mentioned in the terms query? what if we have 30 - 40 of those?

There's not no, but you'd want to test this on your data to see what is more efficient.