Not sure if this is bug or I am missing something. But Terms facet is
returning wrong count for number of terms.
I have a field which have str_tag_analyzer. I want to get Tag Cloud from
from field. I want to get top 20 tags along with their count (How many
times they appeared).
Terms facet looked solution for this case. I have an understanding that
size parameter in Terms facet query controls how many tags will be
returned.
When I run term facet query with different size, I get unexpected result.
Here is my few queries and its result.
query 1
curl -XGET 'http://localhost:9200/stage_profiles/wrapper_0/_search?pretty=1' -d '
{
query : {
"nested" : {
"query" : {
"field" : {
"gsid" : 222
}
},
"path" : "medals"
}
}, from: 0, size: 0
,
facets: {
"tags" : { "terms" : {"field" : "field_val_t", size: 1} }
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 189,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 57,
"total" : 331,
"other" : 316,
"terms" : [ {
"term" : "hyderabad",
"count" : 15
} ]
}
}
Query 2
curl -XGET 'http://localhost:9200/stage_profiles/wrapper_0/_search?pretty=1' -d '
{
query : {
"nested" : {
"query" : {
"field" : {
"gsid" : 222
}
},
"path" : "medals"
}
}, from: 0, size: 0
,
facets: {
"tags" : { "terms" : {"field" : "field_val_t", size: 3} }
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 189,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 57,
"total" : 331,
"other" : 282,
"terms" : [ {
"term" : "playing",
"count" : 20
}, {
"term" : "hyderabad",
"count" : 15
}, {
"term" : "pune",
"count" : 14
} ]
}
}
}
Query 3
curl -XGET 'http://localhost:9200/stage_profiles/wrapper_0/_search?pretty=1' -d '
{
query : {
"nested" : {
"query" : {
"field" : {
"gsid" : 222
}
},
"path" : "medals"
}
}, from: 0, size: 0
,
facets: {
"tags" : { "terms" : {"field" : "field_val_t", size: 10} }
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 189,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 57,
"total" : 331,
"other" : 198,
"terms" : [ {
"term" : "playing",
"count" : 20
}, {
"term" : "hyderabad",
"count" : 19
}, {
"term" : "bangalore",
"count" : 18
}, {
"term" : "pune",
"count" : 16
}, {
"term" : "chennai",
"count" : 16
}, {
"term" : "games",
"count" : 13
}, {
"term" : "testing",
"count" : 11
}, {
"term" : "cricket",
"count" : 9
}, {
"term" : "singing",
"count" : 6
}, {
"term" : "movies",
"count" : 5
} ]
}
}
}
I have following concerns
- The first query is giving tag with count of 15, but there exists another
tag with count 20 (that can be seen in query 2 and 3). So it must return
"playing" tag with count 20. - 2nd query returns count of "hyderabad" tag as 15 but 3rd query returns
count as 19 for the same tag.
Please let me know if you need any other info such as mapping, data present
in ES.
Thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.