We have a use case that we are trying to crack, but are not entirely sure how this would be best handled. Essentially, it boils down to being able to save "multiple values" for the same field in aggregations, while also being able to filter on them.
The main idea for our data is that we want to 100% separate the search data from our other data, e.g. not relying on IDs to lookup later in a relational database or some other Elasticsearch index.
The values we want to use for lookup/filtering, will always be "slugs" of values, since these come from a special "Search URL DSL" (e.g. an URL could be /video-games-and-consoles/brand=nintendo
to filter to the category "Video Games and Consoles", and filter for the attribute/facet Brand with the value "Nintendo").
This means, in the ideal world, the document would contain both the slug and the actual value.
A sample document could look like:
{
"category": "Video Games and Consoles",
"brand": "Nintendo"
}
This layout would make it very easy to do aggregations, given that everything is using the keyword analyzer - everything would be displayed correctly.
However, now it's not really possible to make a post_filter
with the slug, since it isn't in the document.
It also isn't possible to get the slug from the value without transforming the data in the application.
One solution could be to make two category fields, one with the slug and one with the value, but since there isn't a connection between them, it wouldn't really work with aggregations.
A proposed document could be:
{
"category": "video-games-and-consoles;Video Games and Consoles",
"brand": "nintendo;Nintendo"
}
Now, when making aggregations, each bucket will effectively have both the slug and the value to show to the customer, with the application only needing to do a split on the delimeter.
However, the issue of doing a post_filter
still remains (and the above solution isn't exactly pretty).
Another solution may be to simple create multiple fields, e.g.:
{
"category": "video-games-and-consoles;Video Games and Consoles",
"category_slug": "video-games-and-consoles",
"brand": "nintendo;Nintendo",
"brand_slug": "nintendo"
}
This would mean doing aggregations on one field, while doing the filtering on another field.
Neither of the proposed solutions seems optimal.
is there another way of approaching this issue, or are we stuck with an "ugly" solution of multiple fields? Could this be solved with some clever use of analyzers/search_analyzers that we are just not seeing?