Hi.
After upgrading Elasticsearch from 2.x
to 6.1.2
(and creating a new index from the scratch), I am seeing significant_terms
behavior I don't understand. Pretty much only thing that has changed is the type of productNodeIds
changing from long
to integer
, but that should make no difference?
The query & mappings
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"applicationDate": {
"gte": "2015-01-01"
}
}
},
{
"terms": {
"owners.id": ["/owner/foo", "/owner/bar"]
}
}
]
}
},
"aggregations": {
"significantProducts": {
"significant_terms": {
"field": "productNodeIds",
"size": 3,
"min_doc_count": 10,
"background_filter": {
"bool": {
"filter": [
{
"range": {
"applicationDate": {
"gte": "2008-01-01"
}
}
},
{
"terms": {
"owners.id": ["/owner/foo", "/owner/bar"]
}
}
]
}
}
}
}
}
}
Having mappings:
"applicationDate": {
"type": "date",
"format": "date"
},
"productNodeIds": {
"type": "integer"
},
"owners": {
"properties": {
"id": {
"type": "keyword"
}
}
}
And the response:
{
"took": 91839,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 256,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"significantProducts": {
"doc_count": 256,
"bg_count": 1314,
"buckets": [
{
"key": 62419,
"doc_count": 22,
"score": 9.6182861328125,
"bg_count": 0
},
{
"key": 87339,
"doc_count": 22,
"score": 9.6182861328125,
"bg_count": 0
},
{
"key": 10188,
"doc_count": 22,
"score": 9.6182861328125,
"bg_count": 0
}
]
}
}
}
Questions
- How can the query be so slow (92 seconds) given there are only 256 foreground hits, and 1314 background hits. The queries itself are instant, or if I do e.g.
terms
aggregation instead ofsignificant_terms
it's pretty much instant also. - The aggregation buckets make no sense. The background_filter is superset of foreground, so how can the
bg_count
be zero? It shouldn't be possible to be smaller than thedoc_count
.
What on earth is happening here?
Thank you!