Strange results when trying to run an aggregration query

Hello,

I am trying to run a query similar to the following:

  "aggs": {
    "authors": {
      "terms": {
        "field": "packageAuthors.keyword"
      }
    }
  }

Initially, this seems to be working exactly as I would expect it to. Within the results that are returned, one of the aggregated values comes back with:

        {
          "key" : "Gary Ewan Park",
          "doc_count" : 8
        }

Now to me, this meant that from the query that was returned, there were 8 documents that contained Gary Ewan Park as an author. However, I know that this is not the case, there are 9 documents that contain this author.

After testing, I found that I could get the aggregation query to return the correct value by changing it to the following:

  "aggs": {
    "authors": {
      "terms": {
        "field": "packageAuthors.keyword",
        "size": 1000
      }
    }
  }

Notice the addition of the size property. I thought that the size property only controlled the number of the aggregated values that would be returned, but it seems to be having more of an effect than I understood. Can someone explain what is going on here?

This is my first time posting in this forum, so if this is not the correct place to post this question, please let me know.

Thanks
Gary

shard_size is what you're after. It's a tradeoff between performance and accuracy with multiple shards (in a distributed system).

That size is having an impact on this is a side-effect since shard_size is calculated based on size.

PS: This is one of multiple tradeoffs between performance and accuracy in Elasticsearch; I have a presentation on these with three examples and what you ran into is the first one: Make Your Data FABulous

3 Likes

Thank you for taking the time to respond here, and also on Twitter, really appreciate it!

We don't have that much data, compared to other folks, so dropping to a single shard seems like the correct approach for us that the minute. Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.