Wrong count in terms facet

Not sure if this is bug or I am missing something. But Terms facet is returning wrong count for number of terms.

I have a field which have str_tag_analyzer. I want to get Tag Cloud from from field. I want to get top 20 tags along with their count (How many times they appeared).

Terms facet looked solution for this case. I have an understanding that size parameter in Terms facet query controls how many tags will be returned.

When I run term facet query with different size, I get unexpected result. Here is my few queries and its result.

query 1

curl -XGET 'http://localhost:9200/stage_profiles/wrapper_0/_search?pretty=1' -d '
{
query : {
"nested" : {
"query" : {
"field" : {
"gsid" : 222
}
},
"path" : "medals"
}
}, from: 0, size: 0
,
facets: {
"tags" : { "terms" : {"field" : "field_val_t", size: 1} }
}
}'

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 189,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 57,
"total" : 331,
"other" : 316,
"terms" : [ {
"term" : "hyderabad",
"count" : 15
} ]
}
}

Query 2 curl -XGET 'http://localhost:9200/stage_profiles/wrapper_0/_search?pretty=1' -d ' { query : { "nested" : { "query" : { "field" : { "gsid" : 222 } }, "path" : "medals" } }, from: 0, size: 0 , facets: { "tags" : { "terms" : {"field" : "field_val_t", size: 3} } } }'

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 189,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 57,
"total" : 331,
"other" : 282,
"terms" : [ {
"term" : "playing",
"count" : 20
}, {
"term" : "hyderabad",
"count" : 15
}, {
"term" : "pune",
"count" : 14
} ]
}
}
}

Query 3

curl -XGET 'http://localhost:9200/stage_profiles/wrapper_0/_search?pretty=1' -d '
{
query : {
"nested" : {
"query" : {
"field" : {
"gsid" : 222
}
},
"path" : "medals"
}
}, from: 0, size: 0
,
facets: {
"tags" : { "terms" : {"field" : "field_val_t", size: 10} }
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 189,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 57,
"total" : 331,
"other" : 198,
"terms" : [ {
"term" : "playing",
"count" : 20
}, {
"term" : "hyderabad",
"count" : 19
}, {
"term" : "bangalore",
"count" : 18
}, {
"term" : "pune",
"count" : 16
}, {
"term" : "chennai",
"count" : 16
}, {
"term" : "games",
"count" : 13
}, {
"term" : "testing",
"count" : 11
}, {
"term" : "cricket",
"count" : 9
}, {
"term" : "singing",
"count" : 6
}, {
"term" : "movies",
"count" : 5
} ]
}
}
}

I have following concerns

  1. The first query is giving tag with count of 15, but there exists another tag with count 20 (that can be seen in query 2 and 3). So it must return "playing" tag with count 20.
  2. 2nd query returns count of "hyderabad" tag as 15 but 3rd query returns count as 19 for the same tag.

Please let me know if you need any other info such as mapping, data present in ES.
Thanks

You are running into a frequently hit problem, request a larger facet
count, see here for more details:

Best Regards,
Paul

On Monday, July 1, 2013 4:05:45 AM UTC-6, Hridayesh Gupta wrote:

Not sure if this is bug or I am missing something. But Terms facet is
returning wrong count for number of terms.

I have a field which have str_tag_analyzer. I want to get Tag Cloud from
from field. I want to get top 20 tags along with their count (How many
times
they appeared).

Terms facet looked solution for this case. I have an understanding that
size
parameter in Terms facet query controls how many tags will be returned.

When I run term facet query with different size, I get unexpected result.
Here is my few queries and its result.

query 1

Query 2

Query 3

I have following concerns

  1. The first query is giving tag with count of 15, but there exists
    another
    tag with count 20 (that can be seen in query 2 and 3). So it must return
    "playing" tag with count 20.
  2. 2nd query returns count of "hyderabad" tag as 15 but 3rd query returns
    count as 19 for the same tag.

Please let me know if you need any other info such as mapping, data
present
in ES.
Thanks

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Wrong-count-in-terms-facet-tp4037300.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.