Using multiple values for aggregations for the same field

(Henrik Ossipoff Hansen) #1

We have a use case that we are trying to crack, but are not entirely sure how this would be best handled. Essentially, it boils down to being able to save "multiple values" for the same field in aggregations, while also being able to filter on them.

The main idea for our data is that we want to 100% separate the search data from our other data, e.g. not relying on IDs to lookup later in a relational database or some other Elasticsearch index.

The values we want to use for lookup/filtering, will always be "slugs" of values, since these come from a special "Search URL DSL" (e.g. an URL could be /video-games-and-consoles/brand=nintendo to filter to the category "Video Games and Consoles", and filter for the attribute/facet Brand with the value "Nintendo").

This means, in the ideal world, the document would contain both the slug and the actual value.

A sample document could look like:

  "category": "Video Games and Consoles",
  "brand": "Nintendo"

This layout would make it very easy to do aggregations, given that everything is using the keyword analyzer - everything would be displayed correctly.

However, now it's not really possible to make a post_filter with the slug, since it isn't in the document.
It also isn't possible to get the slug from the value without transforming the data in the application.

One solution could be to make two category fields, one with the slug and one with the value, but since there isn't a connection between them, it wouldn't really work with aggregations.

A proposed document could be:

  "category": "video-games-and-consoles;Video Games and Consoles",
  "brand": "nintendo;Nintendo"

Now, when making aggregations, each bucket will effectively have both the slug and the value to show to the customer, with the application only needing to do a split on the delimeter.

However, the issue of doing a post_filter still remains (and the above solution isn't exactly pretty).

Another solution may be to simple create multiple fields, e.g.:

  "category": "video-games-and-consoles;Video Games and Consoles",
  "category_slug": "video-games-and-consoles",
  "brand": "nintendo;Nintendo",
  "brand_slug": "nintendo"

This would mean doing aggregations on one field, while doing the filtering on another field.

Neither of the proposed solutions seems optimal.

is there another way of approaching this issue, or are we stuck with an "ugly" solution of multiple fields? Could this be solved with some clever use of analyzers/search_analyzers that we are just not seeing?

(Mark Harwood) #2

It sounds like this is another case of the age-old issue of computers-want-IDs-but-people-want-labels.
An ID is unique enough to offer unambiguous queries which is good for retrieval but typically difficult for users to read on a display.

The least-worst solution I tend to opt for is combining both the ID and the label in a single indexed token as in the example of looking at Panama papers data [1]. This way the token is both readable and unambiguous. You may want to assign a shorter unique ID in place of your URLs to save space.

At some stage we may provide better tooling for IDs and associated user labels in Kibana+elasticsearch but for now this combining of ID + label in tokens is the approach I tend to use.


(Henrik Ossipoff Hansen) #3

Hi Mark,

You're absolutely right - age old question. I've seen it around but haven't really seen anyone solve it in a nice way.

Your solution is also what we've described as a solution, but we're unsure about how you would effectively handle filtering for a specific "ID".

Or do you mean that your least-worst solution is about creating two fields, one with simply the ID (to be filterable) and another one that combines the ID and the label?

(Mark Harwood) #4

Depending on your needs you can have all these combos of field:

  1. ID (keyword)

  2. Label (text)

  3. ID+Label (keyword)

  4. can be used in free-text search and 1) can be used for drill-downs.
    However, Kibana assumes that your choice of aggregatable field is both human-readable and unambiguous. There is no dynamic lookup of labels for display purposes given an ID, nor any translation on user-click to convert a displayed label back into an ID for use in drill-down filters. There's no 2 way translation between computer-speak and human-speak.
    To overcome this translation problem I find it useful to use type 3) tokens in Kibana for choices of aggregatable field. It means my bar charts etc have some degree of readable labels and work as filters when I drill-down.
    You can still mix this policy with use of field types 1 and 2 for other reasons outside of Kibana. An ingest pipeline could be useful for assembling these ID+label tokens from the original JSON.

(Henrik Ossipoff Hansen) #5

Got it - makes sense. It basically boils down to what we had thought of ourselves - we just hoped for a more elegant solution.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.