Name resolution in hierarchical facets. How to do it better?

Zer0 · October 10, 2023, 9:30am

Hi,

I have a question about how to model the following scenario in ES and if there is a better way for it than we already have. In our system there are documents and categories for these documents. The categories can be hierarchical and the category names are not unique, therefore every category has a name and an ID. A document at the moment looks something like this when we send it to ES:

{
  "title": "Foo"
  "conetent": "Bar"
  "category": "-1/123/456"
  "category_names": [
    {
      "id": 123
      "name": "Category1"
    },
	{
      "id": 456
      "name": "Category2"
    }
  ]
}

The reason we have the category_names is that the frontend who retrieves this needs to render the category hierarchy with human readable names and we do not want to make an extra lookup for each id. The category is also indexed as ["-1", "-1/123", "-1/123/456"] so we can find it when searching/aggregating for each of these hierarchical steps.
The main use is to do a bucket aggregation over the category field so we get how many documents are there in total (bucket: -1), how much there are in Category1 (bucket: -1/123) etc... the aggregation response only includes the key (-1/123, -1/123/456) so we need to programmatically go over all the documents and build up a lookup map for id:name to render this human readable. Another idea, but I couldn't make it work was to ask ES to also deliver a list of the unique category_names along with the aggregation so the lookup map would be finished when it comes from ES and doesn't have to be build by the frontend (also we don't need the documents themselves in the request, just the name list and the aggregation)

Is there any good way to add name fields to the aggregation e.g. add Category1 to the -1/123? Or what would be a good concept to have hierarchical categories based on ID but also with a name? It seems to me like an absolute standard problem since all shops must probably do this, or is the default really to index a path of human readable names? I don't see how that would go together e.g. with multilingual sites.

Mark_Harwood1 · October 10, 2023, 2:01pm

You’re right, it is a common requirement. I remember having discussions on that topic in this issue which may be of use:

github.com/elastic/kibana

Computers need IDs, people want labels

opened 05:29PM - 10 Feb 17 UTC

elasticmachine

Feature:Graph Team:Visualizations impact:low

*Original comment by @markharwood:* This old chestnut is a general concern wi…th Kibana and specifically an issue in Graph. The unit of our analysis is terms (terms aggs, significant_terms etc) and for this reason they need to be unique: - There is more than one movie called "crash" in the movielens data - There is more than one John Smith in a bank's records. Consequently, to avoid confusion, unique IDs are generated to represent these entities and we must index those for analysis BUT - when visualizing data in graph UI or elsewhere people typically don't want to see the ugly IDs and want useful labels instead. This translation service could be a configurable feature of graph (_"the label for ID field X can be found in index Y and field Z"_). This translation can be implemented as a single multi-get operation when new IDs are loaded into the graph workspace. Equally this could be a general feature as part of Kibana for use in all visualizations. In looking at Panama papers I was forced to index terms that were both an ID and a label - the ID was required to avoid merging multiple "John Smith"s into one but the label was also required to be useful to end users. This made for an ugly UI and added code to the ingest process. The bank client forked the graph UI to trim the ID part of the term from the displayed terms in order to make the UI less ugly.

Zer0 · October 12, 2023, 1:33pm

I acutally managed to do it in a single query, first we define the index so that category_names is nested since the id/name fields belong together.

PUT /testindex
{
  "mappings": {
    "properties": {
      "category_names": { 
        "type": "nested",
        "properties": {
            "id": {
              "type": "long"
            },
            "name": {
              "type": "keyword"
            }
        }
      }
    }
  }
}

Then we put some documents on the index:

PUT /testindex/_doc/1
{
  "category_names": [
    {"id": 1, "name": "foo"},
    {"id": 2, "name": "bar"}
  ]
}

PUT /testindex/_doc/2
{
  "category_names": [
    {"id": 3, "name": "blah"},
    {"id": 4, "name": "baz"}
  ]
}

PUT /testindex/_doc/3
{
  "category_names": [
    {"id": 1, "name": "foo"},
    {"id": 4, "name": "baz"},
    {"id": 2, "name": "bar"}
  ]
}

We can now do a multi_terms aggregation over the nested category_names field (since the id/name tuples are unique, there will not be the same id with a different name).

GET /testindex/_search
{
  "aggs": {
    "outer_expand_nested_field": {
      "nested": {
        "path": "category_names"
      },
      "aggs": {
        "actual_aggregation": {
          "multi_terms": {
            "terms": [
              {"field": "category_names.id" }, 
              {"field": "category_names.name"}
            ]
          }
        }
      }
    }
  }
}

This will yield buckets where the name and id field are both in there with the correct doc_counts.

        "buckets": [
          {
            "key": [
              1,
              "foo"
            ],
            "key_as_string": "1|foo",
            "doc_count": 2
          },
          {
            "key": [
              2,
              "bar"
            ],
            "key_as_string": "2|bar",
            "doc_count": 2
          },
          {
            "key": [
              4,
              "baz"
            ],
            "key_as_string": "4|baz",
            "doc_count": 1
          }

system · November 9, 2023, 1:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hierarchical relationships..? Elasticsearch	4	758	July 6, 2017
Correct way of mapping some structure to Elasticsearch document Elasticsearch	1	139	January 11, 2024
N-Level Category Tree Mapping Elasticsearch	3	2361	April 29, 2019
How to combine elasticsearch aggregation results to category navigation Elasticsearch	2	975	July 5, 2017
Taxonomies In Elasticsearch Elasticsearch	3	3183	July 5, 2017

Name resolution in hierarchical facets. How to do it better?

Related topics