from the documentation:
The size parameter defines how many top terms should be returned out of the
overall terms list. By default, the node coordinating the search process
will ask each shard to provide its own top size terms and once all shards
respond, it will reduce the results to the final list that will then be
sent back to the client. This means that if the number of unique terms is
greater than size, the returned list is slightly off and not accurate (it
could be that the term counts are slightly off and it could even be that a
term that should have been in the top size entries was not returned).
How is it possible that the count for term 2 is 3 in the first response,
but 2 in the second response?
From the docs:
The size parameter defines how many top terms should be returned out of the
overall terms list. By default, the node coordinating the search process
will ask each shard to provide its own top size terms and once all shards
respond, it will reduce the results to the final list that will then be
sent back to the client. This means that if the number of unique terms is
greater than size, the returned list is slightly off and not accurate (it
could be that the term counts are slightly off and it could even be that a
term that should have been in the top size entries was not returned).
How is it possible that the count for term 2 is 3 in the first response,
but 2 in the second response?
From the docs:
The size parameter defines how many top terms should be returned out of
the overall terms list. By default, the node coordinating the search
process will ask each shard to provide its own top size terms and once
all shards respond, it will reduce the results to the final list that will
then be sent back to the client. This means that if the number of unique
terms is greater than size, the returned list is slightly off and not
accurate (it could be that the term counts are slightly off and it could
even be that a term that should have been in the top size entries was not
returned).
One way to have improve accuracy would be to increase shard_size[1]. In
particular if shard_size is greater than the number of unique vaues of your
entityId field, results will be accurate. Please however beware that this
can be resource intensive.
Another option would be to route your indexing requests so that all
documents having the same entityId will end up on the same shard.
How is it possible that the count for term 2 is 3 in the first response,
but 2 in the second response?
From the docs:
The size parameter defines how many top terms should be returned out of
the overall terms list. By default, the node coordinating the search
process will ask each shard to provide its own top size terms and once
all shards respond, it will reduce the results to the final list that will
then be sent back to the client. This means that if the number of unique
terms is greater than size, the returned list is slightly off and not
accurate (it could be that the term counts are slightly off and it could
even be that a term that should have been in the top size entries was
not returned).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.