This is a long-standing issue with ES, see "Terms facet gives wrong count
with n_shards > 1"
opened 09:32AM - 06 Sep 11 UTC
closed 08:29PM - 14 Jul 15 UTC
>enhancement
high hanging fruit
I'm working with nested documents and have noticed that my faceted search interf… ace is giving the wrong counts when I have more than one shard. To be more specific, I'm working with RDF triples (entity > attribute > value) and I'm nesting the attributes (called predicates in my example):
```
{
"_id" : "512a2c022f0b4e3daa341e6c8bcf6c2f",
"url": "http://dbpedia.org/resource/Alan_Shepard",
"predicates": [
{
"type": "type",
"string_value": ["thing", "person", "astronaut"]
}, {
"type": "label",
"string_value": ["Alan Shepard"]
}, {
"type": "time in space",
"float_value": [216.950]
},
... lots more
]
}
```
I've created a shell script (https://gist.github.com/1196986) that recreates the problem with a fresh index. The created data set has these totals:
- thing (30)
- creative work (20)
- video game (10)
- tv show (10)
- people (10)
With only **one shard** the following query gives the correct counts no matter what the size parameter is set to:
```
{
"size": 0,
"query": {
"match_all": {}
},
"facets": {
"type_counts": {
"terms": {
"field": "string_value",
"size": 5
},
"nested": "predicates",
"facet_filter": {
"term": {
"type": "type"
}
}
}
}
}
```
However, with **more than one shard** the size parameter affects the accuracy of the counts. If it is equal to or greater than the number of terms returned by the facet query (5 in this case) then it works fine. However, the terms at the bottom of the list start to display low counts as you reduce the size parameter:
With "size" : 4
- thing (30)
- creative work (20)
- video game (10)
- **tv show (9)**
With "size" : 3
- thing (30)
- **creative work (15)**
- **video game (9)**
With "size" : 2
- thing (30)
- **creative work (15)**
So it looks like the sub-totals from some of the shards aren't being included for some reason. BTW I'm on ubuntu and the problem seems to affect all versions of ES I've tried (17.0, 17.1 and 17.6). Any ideas...?
P.S. absolutely loving ES - it's made my life a lot easier :)
-rr
On Friday, July 19, 2013 12:30:27 PM UTC+2, Sandy Van den Borne wrote:
In ES 0.90.0:
I have the following query:
{ "fields": ["company_city.untouched"],
"query":{"query_string":{"query":"antwerpen","default_operator":"AND"}},
"facets":{
"cities" : { "terms" : {"field" : "company_city.untouched"
,"size": "12"} },
"test":
{"filter": {"term":{"company_city.untouched": "Brasschaat"}}}
}
}
Which gives me:
"facets": {
"cities": {
"_type": "terms",
"missing": 0,
"total": 122,
"other": 45,
"terms": [
...
{
"term": "Brasschaat",
"count": 3
},
...
]
},
"test": {
"_type": "filter",
"count": 4
}
I did not understand why the count was not 4 for "Brasschaat" for both
facets (By the way I get the same result when not specifying a size).
Now as soon as I set the size to a value bigger than 12, the count of the
cities facet for "Brasschaat" changes to 4.
So when I run:
{ "fields": ["company_city.untouched"],
"query":{"query_string":{"query":"antwerpen","default_operator":"AND"}},
"facets":{
"cities" : { "terms" : {"field" : "company_city.untouched"
,"size": "13"} },
"test":
{"filter": {"term":{"company_city.untouched": "Brasschaat"}}}
}
}
I get:
"facets": {
"cities": {
"_type": "terms",
"missing": 0,
"total": 122,
"other": 45,
"terms": [
...
{
"term": "Brasschaat",
"count": 4
},
...
]
},
"test": {
"_type": "filter",
"count": 4
}
The same thing happens for one of the other cities. The counts of all
other cities do not change..
This is on my local machine, the data is not changing and there is only
one shard.
Any idea what is happening here?
Am I doing something wrong?
Thanks!
Sandy
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
For more options, visit https://groups.google.com/groups/opt_out .