Wrong facet counts?

guanyum · October 9, 2012, 8:52am

hi all,

i indexed 4,230,593 docs, into a 3 es-node cluster, with 6 shards, 2
index types.

i tried facet query, and the count number is less than my expect. here
is my test:

query1 (match certain results)

curl -X POST "http://10.20.157.111:9200/sourcing/product/_search?
size=1&pretty=1" -d '
{
"query": {"query_string":{"query":"pub_cat_id:145"}},
"facets":{
 "pub_cat" : {"terms":{"field" : "pub_cat_id","size": 1}}}
}
'
and the counts for pub_cat_id:145 is 501,779, and it's the same as the
results i got from oracle.

query2  (match all query)

curl -X POST "http://10.20.157.111:9200/sourcing/product/_search?
size=1&pretty=1" -d '
{
"query": {"match_all":{}},
"facets":{
"pub_cat" : {"terms":{"field" : "pub_cat_id","size": 1}}}
}
'
and the counts for pub_cat_id:145 become less, and it's 500,337.

a bug here? or something i miss used? any comments or suggestions are
much appreciated. thanks.

elasticsearch version is use is 0.19.9, and pub_cat_id is a single
valued integer. i index doc like this,

XContentFactory.jsonBuilder().startObject().startObject(typeName)
.field("pub_cat_id", aProductDo.getCategoryId())

--

ppearcy · October 10, 2012, 3:19am

Hi,
Take a look at this discussion:
http://elasticsearch-users.115913.n3.nabble.com/Inconsistent-facet-count-td2977512.html

Here is the key info:
"This happens because of the way the distributed facet calculation works.
It gets the top 5 from each shard, and then aggregates it. Because you have
an even distribution of terms, it will not return exact matches. if you
increase the size, you will get better results. One possible value can be
the size times the number of shards, for example: 25."

Try increasing the facet count. It will help, but still isn't bullet proof.
The only bullet proof (albeit crappy) solutions are:

Have all your data in a single shard
Have your count set to the number of distinct terms. This is overkill,
though, and results are usual good enough following Shays guidance (size *

of shards)

Best Regards,
Paul

On Tuesday, October 9, 2012 2:52:28 AM UTC-6, Brian Hu wrote:

hi all,

i indexed 4,230,593 docs, into a 3 es-node cluster, with 6 shards, 2
index types.

i tried facet query, and the count number is less than my expect. here
is my test:

query1 (match certain results)
curl -X POST "http://10.20.157.111:9200/sourcing/product/_search? 
size=1&pretty=1<http://10.20.157.111:9200/sourcing/product/_search?size=1&pretty=1>" 
-d ' 
{ 
"query": {"query_string":{"query":"pub_cat_id:145"}}, 
"facets":{ 
 "pub_cat" : {"terms":{"field" : "pub_cat_id","size": 1}}} 
} 
' 
and the counts for pub_cat_id:145 is 501,779, and it's the same as the 
results i got from oracle. 

query2  (match all query) 
curl -X POST "http://10.20.157.111:9200/sourcing/product/_search?
size=1&pretty=1http://10.20.157.111:9200/sourcing/product/_search?size=1&pretty=1"
-d '
{
"query": {"match_all":{}},
"facets":{
"pub_cat" : {"terms":{"field" : "pub_cat_id","size": 1}}}
}
'
and the counts for pub_cat_id:145 become less, and it's 500,337.

a bug here? or something i miss used? any comments or suggestions are
much appreciated. thanks.

elasticsearch version is use is 0.19.9, and pub_cat_id is a single
valued integer. i index doc like this,

XContentFactory.jsonBuilder().startObject().startObject(typeName)
.field("pub_cat_id", aProductDo.getCategoryId())

--

ppearcy · October 12, 2012, 6:43am

Heya,
Check out this thread:
http://elasticsearch-users.115913.n3.nabble.com/Inconsistent-facet-count-td2977512.html

The key comment:
This happens because of the way the distributed facet calculation works. It
gets the top 5 from each shard, and then aggregates it. Because you have an
even distribution of terms, it will not return exact matches. if you
increase the size, you will get better results. One possible value can be
the size times the number of shards, for example: 25.

Couple of reasonable, albeit not ideal, workarounds:

Request a much higher count than you actually need
If you need to guarantee exact counts, have a two pass query. Run the
first query to get your initial list and feed those back in as part of the
query to get the exact counts.

Best Regards,
Paul

On Tuesday, October 9, 2012 2:52:28 AM UTC-6, Brian Hu wrote:

hi all,

i indexed 4,230,593 docs, into a 3 es-node cluster, with 6 shards, 2
index types.

i tried facet query, and the count number is less than my expect. here
is my test:

query1 (match certain results)
curl -X POST "http://10.20.157.111:9200/sourcing/product/_search? 
size=1&pretty=1<http://10.20.157.111:9200/sourcing/product/_search?size=1&pretty=1>" 
-d ' 
{ 
"query": {"query_string":{"query":"pub_cat_id:145"}}, 
"facets":{ 
 "pub_cat" : {"terms":{"field" : "pub_cat_id","size": 1}}} 
} 
' 
and the counts for pub_cat_id:145 is 501,779, and it's the same as the 
results i got from oracle. 

query2  (match all query) 
curl -X POST "http://10.20.157.111:9200/sourcing/product/_search?
size=1&pretty=1http://10.20.157.111:9200/sourcing/product/_search?size=1&pretty=1"
-d '
{
"query": {"match_all":{}},
"facets":{
"pub_cat" : {"terms":{"field" : "pub_cat_id","size": 1}}}
}
'
and the counts for pub_cat_id:145 become less, and it's 500,337.

a bug here? or something i miss used? any comments or suggestions are
much appreciated. thanks.

elasticsearch version is use is 0.19.9, and pub_cat_id is a single
valued integer. i index doc like this,

XContentFactory.jsonBuilder().startObject().startObject(typeName)
.field("pub_cat_id", aProductDo.getCategoryId())

--

Topic		Replies	Views
Facet counts Elasticsearch	3	659	July 6, 2017
[Posible bug] Re: Loss of count accuracy for term facets Elasticsearch	3	339	July 6, 2017
Facet count seems to be wrong ?! Elasticsearch	1	274	July 6, 2017
Facet count - strange behaviour - different result for different size of most frequent terms Elasticsearch	9	512	July 6, 2017
Loss of count accuracy for term facets Elasticsearch	1	277	July 6, 2017

Wrong facet counts?

of shards)

Related topics