Faceting/aggregations of nested fields

Hi,
I need to implement a faceting feature for nested fields. For example, I have two documents, each has an “applicability” array, which defines nested fields state, lob and year, specifying where this document is applicable:

{
  "title": "Document 1",
  "applicability": [
    {
      "state": "NY",
      "lob": "AA",
      "year": 2001
    },
    {
      "state": "NJ",
      "lob": "BB",
      "year": 2001
    },
    {
      "state": "NJ",
      "lob": "BB",
      "year": 2002
    }
  ]
}

{
  "title": "Document 2",
  "applicability": [ 
    {
      "state": "NJ",
      "lob": "AA",
      "year": 2001
    },
    {
      "state": "NJ",
      "lob": "BB",
      "year": 2001
    },
    {
      "state": "NJ",
      "lob": "BB",
      "year": 2002
    }
  ]
}

The mapping is very simple, all keywords with nested applicability.
Let’s say, the task is to return all documents applicable for NJ state and display the state and lob facets.
I run following query with aggregations:

{
    "from": 0,
    "size": 0,
    "query": {
        "nested": {
            "path": "applicability",
            "query": {
                "query_string": { "query": "applicability.state:NJ" }
            }
        }
    },
    "aggregations": {
        "applicability": {
            "nested": {
                "path": "applicability"
            },
            "aggregations": {
                "filtered": {
                    "filter": {
                        "query_string": { "query": "applicability.state:NJ" }
                    },
                    "aggregations": {
                        "state": {
                            "terms": {
                                "field": "applicability.state"
                            }
                        },
                        "lob": {
                            "terms": {
                                "field": "applicability.lob"
                            }
                        }
                    }
                }
            }
        }
    }
}

And I got result:

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "applicability": {
            "doc_count": 6,
            "filtered": {
                "doc_count": 5,
                "state": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                        {
                            "key": "NJ",
                            "doc_count": 5
                        }
                    ]
                },
                "lob": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                        {
                            "key": "BB",
                            "doc_count": 4
                        },
                        {
                            "key": "AA",
                            "doc_count": 1
                        }
                    ]
                }
            }
        }
    }
}

The result is two documents (as expected), but faceting has state doc_count NJ:5 and for lob doc_count AA:1, BB:4. However only two documents are found and I expect facets for state NJ:2 and lob AA:1, BB:2. How to achieve that? Tried to use cardinality, but run into issue of approximation for more than 40000 results, tried to use scripts, it returns correct counts, but works very slow. It seems to me it should be very common problem, but I cannot find a good solution, may be I'm missing something.

Use reverse_nested aggregation. You can count the doc_count of root documents.

PUT test_nested
{
  "mappings": {
    "properties": {
      "n":{
        "type":"nested",
        "properties": {
          "a":{
            "type":"keyword"
          }
        }
      }
    }
  }
}

POST test_nested/_doc
{
  "n":[
    {"a": "foo"},
    {"a":"baa"}]
}
POST test_nested/_doc
{
  "n":[
    {"a": "foo"},
    {"a": "foo"},
    {"a": "foo"},
    {"a":"baa"}]
}

GET test_nested/_search
{
  "size":0,
  "aggs":{
    "n":{
      "nested":{
        "path": "n"
      },
      "aggs":{
        "t":{
          "terms": {
            "field": "n.a",
            "size": 10
          },
          "aggs":{
            "r":{
              "reverse_nested": {}
            }
          }
        }
      }
    }
  }
}

The reverse_nested aggregation works! Thank you!
Implemented in Java as following:
Query:
if (grouped) {
termsAggregationBuilder.subAggregation(AggregationBuilders.reverseNested(UNIQUE));
}

Result:
if (grouped) {
ParsedReverseNested parsedReverseNested = bucket.getAggregations().get(UNIQUE);
value = parsedReverseNested.getDocCount();
} else {
value = bucket.getDocCount();
}

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.