Faceting/aggregations of nested fields

vadim-permakoff · June 24, 2022, 10:19pm

Hi,
I need to implement a faceting feature for nested fields. For example, I have two documents, each has an “applicability” array, which defines nested fields state, lob and year, specifying where this document is applicable:

{
  "title": "Document 1",
  "applicability": [
    {
      "state": "NY",
      "lob": "AA",
      "year": 2001
    },
    {
      "state": "NJ",
      "lob": "BB",
      "year": 2001
    },
    {
      "state": "NJ",
      "lob": "BB",
      "year": 2002
    }
  ]
}

{
  "title": "Document 2",
  "applicability": [ 
    {
      "state": "NJ",
      "lob": "AA",
      "year": 2001
    },
    {
      "state": "NJ",
      "lob": "BB",
      "year": 2001
    },
    {
      "state": "NJ",
      "lob": "BB",
      "year": 2002
    }
  ]
}

The mapping is very simple, all keywords with nested applicability.
Let’s say, the task is to return all documents applicable for NJ state and display the state and lob facets.
I run following query with aggregations:

{
    "from": 0,
    "size": 0,
    "query": {
        "nested": {
            "path": "applicability",
            "query": {
                "query_string": { "query": "applicability.state:NJ" }
            }
        }
    },
    "aggregations": {
        "applicability": {
            "nested": {
                "path": "applicability"
            },
            "aggregations": {
                "filtered": {
                    "filter": {
                        "query_string": { "query": "applicability.state:NJ" }
                    },
                    "aggregations": {
                        "state": {
                            "terms": {
                                "field": "applicability.state"
                            }
                        },
                        "lob": {
                            "terms": {
                                "field": "applicability.lob"
                            }
                        }
                    }
                }
            }
        }
    }
}

And I got result:

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "applicability": {
            "doc_count": 6,
            "filtered": {
                "doc_count": 5,
                "state": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                        {
                            "key": "NJ",
                            "doc_count": 5
                        }
                    ]
                },
                "lob": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                        {
                            "key": "BB",
                            "doc_count": 4
                        },
                        {
                            "key": "AA",
                            "doc_count": 1
                        }
                    ]
                }
            }
        }
    }
}

The result is two documents (as expected), but faceting has state doc_count NJ:5 and for lob doc_count AA:1, BB:4. However only two documents are found and I expect facets for state NJ:2 and lob AA:1, BB:2. How to achieve that? Tried to use cardinality, but run into issue of approximation for more than 40000 results, tried to use scripts, it returns correct counts, but works very slow. It seems to me it should be very common problem, but I cannot find a good solution, may be I'm missing something.

Tomo_M · June 26, 2022, 1:33pm

Use reverse_nested aggregation. You can count the doc_count of root documents.

PUT test_nested
{
  "mappings": {
    "properties": {
      "n":{
        "type":"nested",
        "properties": {
          "a":{
            "type":"keyword"
          }
        }
      }
    }
  }
}

POST test_nested/_doc
{
  "n":[
    {"a": "foo"},
    {"a":"baa"}]
}
POST test_nested/_doc
{
  "n":[
    {"a": "foo"},
    {"a": "foo"},
    {"a": "foo"},
    {"a":"baa"}]
}

GET test_nested/_search
{
  "size":0,
  "aggs":{
    "n":{
      "nested":{
        "path": "n"
      },
      "aggs":{
        "t":{
          "terms": {
            "field": "n.a",
            "size": 10
          },
          "aggs":{
            "r":{
              "reverse_nested": {}
            }
          }
        }
      }
    }
  }
}

vadim-permakoff · June 28, 2022, 9:11pm

The reverse_nested aggregation works! Thank you!
Implemented in Java as following:
Query:
if (grouped) {
termsAggregationBuilder.subAggregation(AggregationBuilders.reverseNested(UNIQUE));
}

Result:
if (grouped) {
ParsedReverseNested parsedReverseNested = bucket.getAggregations().get(UNIQUE);
value = parsedReverseNested.getDocCount();
} else {
value = bucket.getDocCount();
}

system · July 26, 2022, 9:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nested documents & faceting Elasticsearch	3	850	July 6, 2017
Facets on nested documents Elasticsearch	1	285	July 6, 2017
Facet search on nested fields Elasticsearch	2	473	July 6, 2017
Nested and faceting Elasticsearch	1	242	July 6, 2017
Terms Facet problem with nested document Elasticsearch	1	346	July 6, 2017

Faceting/aggregations of nested fields

Related topics