I need to support query on dynamic (not predefined) tags in elastic
search. Lets say I have a blog document and wanted to support query on
different set of columns i.e. tagTypeA=valueX & tagTypeB=ValueY and
these tagTypeX columns are not known beforehand. There will be only one
value for each of these tags. The user will pass this additional data as
Map String::String to my API (no strict model / structure)
I am thinking of three ways to support this feature.
-
Declare that I can support a maximum of N type of dynamic tags
only per document (say 10) and create internal columns like Tag1,Tag2
... Tag10. Now have a config to maintain the mapping of TagTypeA=Tag1,
TagTypeB=Tag2 etc. In the code, iterate the input key value pair and
generate ES search query dynamically by using key to columnName mapping.
Pros : Simple to implement
Cons : Overhead of maintaining the mapping. This has to modified
every-time a new type of document/client is onboarded / new field has to
be added for existing client. -
Create a non-analyzed field in ES with array of strings. When
storing the data, store in a concatenated format of
key+"Delimiter"+value. So if the input map has TagTypeA=Good &
TagTypeB=High, then this will be stored as
["TagTypeA-Good","TagTypeB-High"] in ES. When user queries, construct
back the contacted strings and search them.
Pros : No code changes required to onboard new clients / to add or
update new fields
Cons : First of all it doesn't sound clean. The key should not have
Delimeter. Changing mapping at later point of time is very tedious as we
have to change values of all existing string values. -
Don't define any schema and let the json key - value pair of tags
passthrough to elastic search PUT call. For any new keys which are not
already present elastic search will automatically add it to the indices
with default type inference (which can be controlled using dynamic tempaltes).
Pros : No configuration or manual concatenation of input. Any addition
of columns in handled transparently without any manual effort.
Cons : We are relying on the default index creation settings of ES which
may not suit the requirement always. I feel there will be more cons on
this, but can't think of them any. Please suggest.
I am personally thinking on aligning to Option #3.
Can any one please share your views on above three approaches and if there is a better way to solve this.
Thanks,
Harish