Hi,
I need to implement a faceting feature for nested fields. For example, I have two documents, each has an “applicability” array, which defines nested fields state, lob and year, specifying where this document is applicable:
{
"title": "Document 1",
"applicability": [
{
"state": "NY",
"lob": "AA",
"year": 2001
},
{
"state": "NJ",
"lob": "BB",
"year": 2001
},
{
"state": "NJ",
"lob": "BB",
"year": 2002
}
]
}
{
"title": "Document 2",
"applicability": [
{
"state": "NJ",
"lob": "AA",
"year": 2001
},
{
"state": "NJ",
"lob": "BB",
"year": 2001
},
{
"state": "NJ",
"lob": "BB",
"year": 2002
}
]
}
The mapping is very simple, all keywords with nested applicability.
Let’s say, the task is to return all documents applicable for NJ state and display the state and lob facets.
I run following query with aggregations:
{
"from": 0,
"size": 0,
"query": {
"nested": {
"path": "applicability",
"query": {
"query_string": { "query": "applicability.state:NJ" }
}
}
},
"aggregations": {
"applicability": {
"nested": {
"path": "applicability"
},
"aggregations": {
"filtered": {
"filter": {
"query_string": { "query": "applicability.state:NJ" }
},
"aggregations": {
"state": {
"terms": {
"field": "applicability.state"
}
},
"lob": {
"terms": {
"field": "applicability.lob"
}
}
}
}
}
}
}
}
And I got result:
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"applicability": {
"doc_count": 6,
"filtered": {
"doc_count": 5,
"state": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NJ",
"doc_count": 5
}
]
},
"lob": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BB",
"doc_count": 4
},
{
"key": "AA",
"doc_count": 1
}
]
}
}
}
}
}
The result is two documents (as expected), but faceting has state doc_count NJ:5 and for lob doc_count AA:1, BB:4. However only two documents are found and I expect facets for state NJ:2 and lob AA:1, BB:2. How to achieve that? Tried to use cardinality, but run into issue of approximation for more than 40000 results, tried to use scripts, it returns correct counts, but works very slow. It seems to me it should be very common problem, but I cannot find a good solution, may be I'm missing something.