Term facet bug on multi-shard indexes

Term facet doesn't work correct when index consists from several shards and
terms frequency differs on sundry shards.
Here is instruction how to reproduce that bug:

First of all, we need to create index 'test' within 2 shards. After that,
execute 3 times:

$ curl -XPOST 'http://localhost:9200/test/mytype?routing=0/' -d '{
"value" : 5 }'

execute 1 time:

$ curl -XPOST 'http://localhost:9200/test/mytype?routing=0/' -d '{
"value" : 7 }'

execute 1 time:

$ curl -XPOST 'http://localhost:9200/test/mytype?routing=1/' -d '{
"value" : 5 }'

execute 2 times:

$ curl -XPOST 'http://localhost:9200/test/mytype?routing=1/' -d '{
"value" : 7 }'

After that, we have seven documents, four of them on first shard (routing
0) and three on the second shard.
Now execute query:
{
"facets": {
"value": {
"terms": {
"field": "value",
"size": 1
}
}
},
"query": { "match_all": {} },
"size": 0
}
The response is:
{
.....
"facets" : {
"value" : {
"_type" : "terms",
"missing" : 0,
"total" : 7,
"other" : 4,
"terms" : [{
"term" : 5,
"count" : 3
}
]
}
}
}
But frequency of '5' term is 4!
If I change facet size to 2, I will receive right response:
{
.....
"facets" : {
"value" : {
"_type" : "terms",
"missing" : 0,
"total" : 7,
"other" : 0,
"terms" : [{
"term" : 5,
"count" : 4
}, {
"term" : 7,
"count" : 3
}
]
}
}
}
Elastic Search engine takes [size] of most frequent terms on each shard,
not on whole index. It's ok for query_and_fetch request, but when I execute
query_then_fetch I expect to receive right answer.
Can you fix that bug, please?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I suppose that you are talking about this issue: terms facet gives wrong count with n_shards > 1 · Issue #1305 · elastic/elasticsearch · GitHub

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 10 juil. 2013 à 14:13, gregory.bityukov@gmail.com a écrit :

Term facet doesn't work correct when index consists from several shards and terms frequency differs on sundry shards.
Here is instruction how to reproduce that bug:

First of all, we need to create index 'test' within 2 shards. After that, execute 3 times:

$ curl -XPOST 'http://localhost:9200/test/mytype?routing=0/' -d '{ "value" : 5 }'

execute 1 time:

$ curl -XPOST 'http://localhost:9200/test/mytype?routing=0/' -d '{ "value" : 7 }'

execute 1 time:

$ curl -XPOST 'http://localhost:9200/test/mytype?routing=1/' -d '{ "value" : 5 }'

execute 2 times:

$ curl -XPOST 'http://localhost:9200/test/mytype?routing=1/' -d '{ "value" : 7 }'

After that, we have seven documents, four of them on first shard (routing 0) and three on the second shard.
Now execute query:
{
"facets": {
"value": {
"terms": {
"field": "value",
"size": 1
}
}
},
"query": { "match_all": {} },
"size": 0
}
The response is:
{
.....
"facets" : {
"value" : {
"_type" : "terms",
"missing" : 0,
"total" : 7,
"other" : 4,
"terms" : [{
"term" : 5,
"count" : 3
}
]
}
}
}
But frequency of '5' term is 4!
If I change facet size to 2, I will receive right response:
{
.....
"facets" : {
"value" : {
"_type" : "terms",
"missing" : 0,
"total" : 7,
"other" : 0,
"terms" : [{
"term" : 5,
"count" : 4
}, {
"term" : 7,
"count" : 3
}
]
}
}
}
Elastic Search engine takes [size] of most frequent terms on each shard, not on whole index. It's ok for query_and_fetch request, but when I execute query_then_fetch I expect to receive right answer.
Can you fix that bug, please?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.