Supporting query on dynamic columns in elastic search


(Harish Kommaraju) #1

I need to support query on dynamic (not predefined) tags in elastic
search. Lets say I have a blog document and wanted to support query on
different set of columns i.e. tagTypeA=valueX & tagTypeB=ValueY and
these tagTypeX columns are not known beforehand. There will be only one
value for each of these tags. The user will pass this additional data as
Map String::String to my API (no strict model / structure)

I am thinking of three ways to support this feature.

  1. Declare that I can support a maximum of N type of dynamic tags
    only per document (say 10) and create internal columns like Tag1,Tag2
    ... Tag10. Now have a config to maintain the mapping of TagTypeA=Tag1,
    TagTypeB=Tag2 etc. In the code, iterate the input key value pair and
    generate ES search query dynamically by using key to columnName mapping.
    Pros : Simple to implement
    Cons : Overhead of maintaining the mapping. This has to modified
    every-time a new type of document/client is onboarded / new field has to
    be added for existing client.

  2. Create a non-analyzed field in ES with array of strings. When
    storing the data, store in a concatenated format of
    key+"Delimiter"+value. So if the input map has TagTypeA=Good &
    TagTypeB=High, then this will be stored as
    ["TagTypeA-Good","TagTypeB-High"] in ES. When user queries, construct
    back the contacted strings and search them.
    Pros : No code changes required to onboard new clients / to add or
    update new fields
    Cons : First of all it doesn't sound clean. The key should not have
    Delimeter. Changing mapping at later point of time is very tedious as we
    have to change values of all existing string values.

  3. Don't define any schema and let the json key - value pair of tags
    passthrough to elastic search PUT call. For any new keys which are not
    already present elastic search will automatically add it to the indices
    with default type inference (which can be controlled using dynamic tempaltes).
    Pros : No configuration or manual concatenation of input. Any addition
    of columns in handled transparently without any manual effort.
    Cons : We are relying on the default index creation settings of ES which
    may not suit the requirement always. I feel there will be more cons on
    this, but can't think of them any. Please suggest.

I am personally thinking on aligning to Option #3.

Can any one please share your views on above three approaches and if there is a better way to solve this.

Thanks,
Harish


(Christian Dahlqvist) #2

Another option worth considering might be to store them as nested data types, each column stored as a document with a 'key' and 'value' field. This avoids having the mappings explode while still giving you a fair amount of flexibility.


(Harish Kommaraju) #3

Thanks for your suggestion. I also need to do aggregations on the keys i.e. queries like No. of entries having TagTypeA=23 & TagTypeB=45. Will the performance be impacted by having nested structure for these aggregations? (I understand that Option #2 is no longer a valid one when we need aggregations on these column names)


(system) #4