Terms Facet order by Count ... totally broken?

Simple Use Case:
https://gist.github.com/nbauernfeind/9e77615c7c2e57f6e8a5

Basically, create 11 documents all of which contain 'A' and several other
strings. A sample document:
{
"str": ["A", "B", "C", "D", "Q", "R"]
}'

The do a terms facet ordered by count with a size = 1 (it should return the
term 'a'):
$ curl -XPOST "http://localhost:9200/facet_test/user/_search?pretty=true"
-d '

{
"from" : 0,
"size" : 0,
"facets" : {
"str" : {
"terms" : {
"field" : "str",
"size" : 1,
"order" : "count"
}
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 11,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"str" : {
"_type" : "terms",
"missing" : 1,
"total" : 33,
"other" : 28,
"terms" : [ {
"term" : "d",
"count" : 5
} ]
}
}
}

And see how even the returned value now returns a different count on this
second query (with size = 2):
$ curl -XPOST "http://localhost:9200/facet_test/user/_search?pretty=true"
-d '

{
"from" : 0,
"size" : 0,
"facets" : {
"str" : {
"terms" : {
"field" : "str",
"size" : 2,
"order" : "count"
}
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 11,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"str" : {
"_type" : "terms",
"missing" : 1,
"total" : 33,
"other" : 19,
"terms" : [ {
"term" : "d",
"count" : 7
}, {
"term" : "c",
"count" : 7
} ]
}
}
}

Is this a bug or do I have absolutely no idea how this should work?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

This is due to the low doc count and default 5 shards and the way counts
are calculated across shards. See

On Friday, May 24, 2013, Nate Bauernfeind wrote:

Simple Use Case:
https://gist.github.com/nbauernfeind/9e77615c7c2e57f6e8a5

Basically, create 11 documents all of which contain 'A' and several other
strings. A sample document:
{
"str": ["A", "B", "C", "D", "Q", "R"]
}'

The do a terms facet ordered by count with a size = 1 (it should return
the term 'a'):
$ curl -XPOST "http://localhost:9200/facet_test/user/_search?pretty=true"
-d '

{
"from" : 0,
"size" : 0,
"facets" : {
"str" : {
"terms" : {
"field" : "str",
"size" : 1,
"order" : "count"
}
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 11,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"str" : {
"_type" : "terms",
"missing" : 1,
"total" : 33,
"other" : 28,
"terms" : [ {
"term" : "d",
"count" : 5
} ]
}
}
}

And see how even the returned value now returns a different count on this
second query (with size = 2):
$ curl -XPOST "http://localhost:9200/facet_test/user/_search?pretty=true"
-d '

{
"from" : 0,
"size" : 0,
"facets" : {
"str" : {
"terms" : {
"field" : "str",
"size" : 2,
"order" : "count"
}
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 11,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"str" : {
"_type" : "terms",
"missing" : 1,
"total" : 33,
"other" : 19,
"terms" : [ {
"term" : "d",
"count" : 7
}, {
"term" : "c",
"count" : 7
} ]
}
}
}

Is this a bug or do I have absolutely no idea how this should work?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com <javascript:_e({},
'cvml', 'elasticsearch%2Bunsubscribe@googlegroups.com');>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Thanks,
Matt Weber

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Note also that A is a stopword (default standard analyzer for strings) so my guess is that it's not indexed.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 25 mai 2013 à 01:50, Nate Bauernfeind nate.bauernfeind@gmail.com a écrit :

Simple Use Case:
https://gist.github.com/nbauernfeind/9e77615c7c2e57f6e8a5

Basically, create 11 documents all of which contain 'A' and several other strings. A sample document:
{
"str": ["A", "B", "C", "D", "Q", "R"]
}'

The do a terms facet ordered by count with a size = 1 (it should return the term 'a'):
$ curl -XPOST "http://localhost:9200/facet_test/user/_search?pretty=true" -d '

{
"from" : 0,
"size" : 0,
"facets" : {
"str" : {
"terms" : {
"field" : "str",
"size" : 1,
"order" : "count"
}
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 11,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"str" : {
"_type" : "terms",
"missing" : 1,
"total" : 33,
"other" : 28,
"terms" : [ {
"term" : "d",
"count" : 5
} ]
}
}
}

And see how even the returned value now returns a different count on this second query (with size = 2):
$ curl -XPOST "http://localhost:9200/facet_test/user/_search?pretty=true" -d '

{
"from" : 0,
"size" : 0,
"facets" : {
"str" : {
"terms" : {
"field" : "str",
"size" : 2,
"order" : "count"
}
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 11,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"str" : {
"_type" : "terms",
"missing" : 1,
"total" : 33,
"other" : 19,
"terms" : [ {
"term" : "d",
"count" : 7
}, {
"term" : "c",
"count" : 7
} ]
}
}
}

Is this a bug or do I have absolutely no idea how this should work?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you guys very much. Looks like I'll be going with an size = size * 3
approach and hope that gets me mostly accurate results.

Thanks,
Nate

On Fri, May 24, 2013 at 7:34 PM, David Pilato david@pilato.fr wrote:

Note also that A is a stopword (default standard analyzer for strings) so
my guess is that it's not indexed.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 25 mai 2013 à 01:50, Nate Bauernfeind nate.bauernfeind@gmail.com a
écrit :

Simple Use Case:
https://gist.github.com/nbauernfeind/9e77615c7c2e57f6e8a5

Basically, create 11 documents all of which contain 'A' and several other
strings. A sample document:
{
"str": ["A", "B", "C", "D", "Q", "R"]
}'

The do a terms facet ordered by count with a size = 1 (it should return
the term 'a'):
$ curl -XPOST "http://localhost:9200/facet_test/user/_search?pretty=true"
-d '

{
"from" : 0,
"size" : 0,
"facets" : {
"str" : {
"terms" : {
"field" : "str",
"size" : 1,
"order" : "count"
}
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 11,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"str" : {
"_type" : "terms",
"missing" : 1,
"total" : 33,
"other" : 28,
"terms" : [ {
"term" : "d",
"count" : 5
} ]
}
}
}

And see how even the returned value now returns a different count on this
second query (with size = 2):
$ curl -XPOST "http://localhost:9200/facet_test/user/_search?pretty=true"
-d '

{
"from" : 0,
"size" : 0,
"facets" : {
"str" : {
"terms" : {
"field" : "str",
"size" : 2,
"order" : "count"
}
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 11,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"str" : {
"_type" : "terms",
"missing" : 1,
"total" : 33,
"other" : 19,
"terms" : [ {
"term" : "d",
"count" : 7
}, {
"term" : "c",
"count" : 7
} ]
}
}
}

Is this a bug or do I have absolutely no idea how this should work?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/jNVnabFk1jY/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.