Facet counts change when setting size from 12 to 13

In ES 0.90.0:

I have the following query:

{ "fields": ["company_city.untouched"],
"query":{"query_string":{"query":"antwerpen","default_operator":"AND"}},
"facets":{
"cities" : { "terms" : {"field" : "company_city.untouched"
,"size": "12"} },
"test":
{"filter": {"term":{"company_city.untouched": "Brasschaat"}}}
}
}

Which gives me:

"facets": {
"cities": {
"_type": "terms",
"missing": 0,
"total": 122,
"other": 45,
"terms": [
...
{
"term": "Brasschaat",
"count": 3
},
...
]
},
"test": {
"_type": "filter",
"count": 4
}

I did not understand why the count was not 4 for "Brasschaat" for both
facets (By the way I get the same result when not specifying a size).

Now as soon as I set the size to a value bigger than 12, the count of the
cities facet for "Brasschaat" changes to 4.

So when I run:
{ "fields": ["company_city.untouched"],
"query":{"query_string":{"query":"antwerpen","default_operator":"AND"}},
"facets":{
"cities" : { "terms" : {"field" : "company_city.untouched"
,"size": "13"} },
"test":
{"filter": {"term":{"company_city.untouched": "Brasschaat"}}}
}
}

I get:
"facets": {
"cities": {
"_type": "terms",
"missing": 0,
"total": 122,
"other": 45,
"terms": [
...
{
"term": "Brasschaat",
"count": 4
},
...
]
},
"test": {
"_type": "filter",
"count": 4
}

The same thing happens for one of the other cities. The counts of all other
cities do not change..

This is on my local machine, the data is not changing and there is only one
shard.

Any idea what is happening here?
Am I doing something wrong?

Thanks!

Sandy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

This is a normal behaviour.

When you do a facet, the top x is calculated on each shards, and then,
those results a merged to make the total count.
The list of x terms is not necessary the same on each shards, so, the
result may end up with counts inferior to reality, and even with a false
top x.
here, we "solved" the problem by facetting with a size of x+20, and then
just taking into account the x first counts.
It will increase accuracy, but will not, however, make for 100% accurates
counts. It was enough for us anyway.
If you need 100 % accuracy, you'll have to facet with a size equal to the
total number of different terms of that field.

Hope it helps.

Le vendredi 19 juillet 2013 12:30:27 UTC+2, Sandy Van den Borne a écrit :

In ES 0.90.0:

I have the following query:

{ "fields": ["company_city.untouched"],
"query":{"query_string":{"query":"antwerpen","default_operator":"AND"}},
"facets":{
"cities" : { "terms" : {"field" : "company_city.untouched"
,"size": "12"} },
"test":
{"filter": {"term":{"company_city.untouched": "Brasschaat"}}}
}
}

Which gives me:

"facets": {
"cities": {
"_type": "terms",
"missing": 0,
"total": 122,
"other": 45,
"terms": [
...
{
"term": "Brasschaat",
"count": 3
},
...
]
},
"test": {
"_type": "filter",
"count": 4
}

I did not understand why the count was not 4 for "Brasschaat" for both
facets (By the way I get the same result when not specifying a size).

Now as soon as I set the size to a value bigger than 12, the count of the
cities facet for "Brasschaat" changes to 4.

So when I run:
{ "fields": ["company_city.untouched"],
"query":{"query_string":{"query":"antwerpen","default_operator":"AND"}},
"facets":{
"cities" : { "terms" : {"field" : "company_city.untouched"
,"size": "13"} },
"test":
{"filter": {"term":{"company_city.untouched": "Brasschaat"}}}
}
}

I get:
"facets": {
"cities": {
"_type": "terms",
"missing": 0,
"total": 122,
"other": 45,
"terms": [
...
{
"term": "Brasschaat",
"count": 4
},
...
]
},
"test": {
"_type": "filter",
"count": 4
}

The same thing happens for one of the other cities. The counts of all
other cities do not change..

This is on my local machine, the data is not changing and there is only
one shard.

Any idea what is happening here?
Am I doing something wrong?

Thanks!

Sandy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

This is a long-standing issue with ES, see "Terms facet gives wrong count
with n_shards > 1"

-rr

On Friday, July 19, 2013 12:30:27 PM UTC+2, Sandy Van den Borne wrote:

In ES 0.90.0:

I have the following query:

{ "fields": ["company_city.untouched"],
"query":{"query_string":{"query":"antwerpen","default_operator":"AND"}},
"facets":{
"cities" : { "terms" : {"field" : "company_city.untouched"
,"size": "12"} },
"test":
{"filter": {"term":{"company_city.untouched": "Brasschaat"}}}
}
}

Which gives me:

"facets": {
"cities": {
"_type": "terms",
"missing": 0,
"total": 122,
"other": 45,
"terms": [
...
{
"term": "Brasschaat",
"count": 3
},
...
]
},
"test": {
"_type": "filter",
"count": 4
}

I did not understand why the count was not 4 for "Brasschaat" for both
facets (By the way I get the same result when not specifying a size).

Now as soon as I set the size to a value bigger than 12, the count of the
cities facet for "Brasschaat" changes to 4.

So when I run:
{ "fields": ["company_city.untouched"],
"query":{"query_string":{"query":"antwerpen","default_operator":"AND"}},
"facets":{
"cities" : { "terms" : {"field" : "company_city.untouched"
,"size": "13"} },
"test":
{"filter": {"term":{"company_city.untouched": "Brasschaat"}}}
}
}

I get:
"facets": {
"cities": {
"_type": "terms",
"missing": 0,
"total": 122,
"other": 45,
"terms": [
...
{
"term": "Brasschaat",
"count": 4
},
...
]
},
"test": {
"_type": "filter",
"count": 4
}

The same thing happens for one of the other cities. The counts of all
other cities do not change..

This is on my local machine, the data is not changing and there is only
one shard.

Any idea what is happening here?
Am I doing something wrong?

Thanks!

Sandy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.