Name resolution in hierarchical facets. How to do it better?

Hi,

I have a question about how to model the following scenario in ES and if there is a better way for it than we already have. In our system there are documents and categories for these documents. The categories can be hierarchical and the category names are not unique, therefore every category has a name and an ID. A document at the moment looks something like this when we send it to ES:

{
  "title": "Foo"
  "conetent": "Bar"
  "category": "-1/123/456"
  "category_names": [
    {
      "id": 123
      "name": "Category1"
    },
	{
      "id": 456
      "name": "Category2"
    }
  ]
}

The reason we have the category_names is that the frontend who retrieves this needs to render the category hierarchy with human readable names and we do not want to make an extra lookup for each id. The category is also indexed as ["-1", "-1/123", "-1/123/456"] so we can find it when searching/aggregating for each of these hierarchical steps.
The main use is to do a bucket aggregation over the category field so we get how many documents are there in total (bucket: -1), how much there are in Category1 (bucket: -1/123) etc... the aggregation response only includes the key (-1/123, -1/123/456) so we need to programmatically go over all the documents and build up a lookup map for id:name to render this human readable. Another idea, but I couldn't make it work was to ask ES to also deliver a list of the unique category_names along with the aggregation so the lookup map would be finished when it comes from ES and doesn't have to be build by the frontend (also we don't need the documents themselves in the request, just the name list and the aggregation)

Is there any good way to add name fields to the aggregation e.g. add Category1 to the -1/123? Or what would be a good concept to have hierarchical categories based on ID but also with a name? It seems to me like an absolute standard problem since all shops must probably do this, or is the default really to index a path of human readable names? I don't see how that would go together e.g. with multilingual sites.

1 Like

You’re right, it is a common requirement. I remember having discussions on that topic in this issue which may be of use:

1 Like

I acutally managed to do it in a single query, first we define the index so that category_names is nested since the id/name fields belong together.

PUT /testindex
{
  "mappings": {
    "properties": {
      "category_names": { 
        "type": "nested",
        "properties": {
            "id": {
              "type": "long"
            },
            "name": {
              "type": "keyword"
            }
        }
      }
    }
  }
}

Then we put some documents on the index:

PUT /testindex/_doc/1
{
  "category_names": [
    {"id": 1, "name": "foo"},
    {"id": 2, "name": "bar"}
  ]
}

PUT /testindex/_doc/2
{
  "category_names": [
    {"id": 3, "name": "blah"},
    {"id": 4, "name": "baz"}
  ]
}

PUT /testindex/_doc/3
{
  "category_names": [
    {"id": 1, "name": "foo"},
    {"id": 4, "name": "baz"},
    {"id": 2, "name": "bar"}
  ]
}

We can now do a multi_terms aggregation over the nested category_names field (since the id/name tuples are unique, there will not be the same id with a different name).

GET /testindex/_search
{
  "aggs": {
    "outer_expand_nested_field": {
      "nested": {
        "path": "category_names"
      },
      "aggs": {
        "actual_aggregation": {
          "multi_terms": {
            "terms": [
              {"field": "category_names.id" }, 
              {"field": "category_names.name"}
            ]
          }
        }
      }
    }
  }
}

This will yield buckets where the name and id field are both in there with the correct doc_counts.

        "buckets": [
          {
            "key": [
              1,
              "foo"
            ],
            "key_as_string": "1|foo",
            "doc_count": 2
          },
          {
            "key": [
              2,
              "bar"
            ],
            "key_as_string": "2|bar",
            "doc_count": 2
          },
          {
            "key": [
              4,
              "baz"
            ],
            "key_as_string": "4|baz",
            "doc_count": 1
          }
2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.