Term facet doesn't work correct when index consists from several shards and terms frequency differs on sundry shards.
Here is instruction how to reproduce that bug:
First of all, we need to create index 'test' within 2 shards. After that, execute 3 times:
$ curl -XPOST 'http://localhost:9200/test/mytype?routing=0/' -d '{ "value" : 5 }'
execute 1 time:
$ curl -XPOST 'http://localhost:9200/test/mytype?routing=0/' -d '{ "value" : 7 }'
execute 1 time:
$ curl -XPOST 'http://localhost:9200/test/mytype?routing=1/' -d '{ "value" : 5 }'
execute 2 times:
$ curl -XPOST 'http://localhost:9200/test/mytype?routing=1/' -d '{ "value" : 7 }'
After that, we have seven documents, four of them on first shard (routing 0) and three on the second shard.
Now execute query:
{
"facets": {
"value": {
"terms": {
"field": "value",
"size": 1
}
}
},
"query": { "match_all": {} },
"size": 0
}
The response is:
{
.....
"facets" : {
"value" : {
"_type" : "terms",
"missing" : 0,
"total" : 7,
"other" : 4,
"terms" : [{
"term" : 5,
"count" : 3
}
]
}
}
}
But frequency of '5' term is 4!
If I change facet size to 2, I will receive right response:
{
.....
"facets" : {
"value" : {
"_type" : "terms",
"missing" : 0,
"total" : 7,
"other" : 0,
"terms" : [{
"term" : 5,
"count" : 4
}, {
"term" : 7,
"count" : 3
}
]
}
}
}
Elastic Search engine takes [size] of most frequent terms on each shard, not on whole index. It's ok for query_and_fetch request, but when I execute query_then_fetch I expect to receive right answer.
Can you fix that bug, please?