Count query is returning inaccurate results

kapso · May 30, 2021, 3:33pm

Count query (filter term) is returning inaccurate results - */_count

We have tried to get the count using a regular search query using this approach - "size":0,"track_total_hits":true and the count is still inaccurate.

Basically, the counts are the same in both the above queries.

In both the above cases, this is the _shards response -
"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}

We are running v7.13 on the Elastic Cloud. Our index is about ~35m documents. We are running a Staging & Production server on the Cloud (with the same data) and the counts are invalid on both the servers.

We don't do any real-time continuous updates to the index - no updates, no deletes. We add all the ~35m documents via bulk updates using background jobs initially, and no more updates after that.

How do we know the counts are inaccurate - well we were so confused by the inaccuracies, we looped through each document using scroll API to run a loop and do a manual count, and indeed our index/data looks ok, but Elasticsearch, unfortunately, is returning the wrong counts.

How do we solve this issue?

Christian_Dahlqvist · May 31, 2021, 7:09am

What does the output of the cat indices API look like? Have you refreshed after you finished bulk loading, especially if you optimized by changing the refresh interval during loading? What does the full queries and results look like?

What do you get if you run GET /<PUT INDEX PATTERN HERE>/_count?q=*:* ?

kapso · May 31, 2021, 8:04am

@Christian_Dahlqvist Thanks for your response.

We did not change or optimize the refresh interval during loading
Just ran POST /_refresh and it came back under a second with the following response (but the counts did not change)
```
  {
    "_shards": {
      "successful": 43,
      "failed": 0,
      "total": 46
    }
  }
```
Following is the output of cat indices
yellow open products_20210523141120879 Y76XjK2OTbeE4i_8EI6o6g 1 1 37142644 49 59.4gb 59.4gb

Following is the response of the /_count?q=*:* ?

  {
    "count": 37142644,
    "_shards": {
      "successful": 1,
      "failed": 0,
      "skipped": 0,
      "total": 1
    }
  }

Christian_Dahlqvist · May 31, 2021, 8:08am

This is the number of documents the index contains. Are you expecting a different number?

If you are expecting a different number, how did you index the data? If you used bulk requests, did you check that all writes were successful?

kapso · May 31, 2021, 8:10am

@Christian_Dahlqvist that total count 37142644 for the index looks about ok.

Also, we have retries built in our background jobs, so I dont think we run into any problems when doing bulk writes.

Christian_Dahlqvist · May 31, 2021, 8:11am

What is then giving inaccurate results?

kapso · May 31, 2021, 8:12am

The inaccuracies are when we run a filter query like this

{"query":{"bool":{"filter":[{"terms":{"level_2_classification_ids":[261243]}}]}},"size":0,"track_total_hits":true}

Christian_Dahlqvist · May 31, 2021, 8:15am

What result do you get and what are you expecting? What is the mapping for the field you are searching on?

If you are looking for the count of documents that match a filter criteria I would recommend to use an aggregation.

kapso · May 31, 2021, 8:20am

What we get is 199,421 and what we are expecting is 202,164, and some of the filter queries on other ids are even way off
This is the mapping for that field...

    "level_2_classification_ids": {
              "type": "long"
            },

We are using v7.13 (its the latest & greatest)

Christian_Dahlqvist · May 31, 2021, 8:29am

I vaguely recall there was some optimization introduced recently that improved performance by cutting the query short, which could result in inaccurate counts. Haven't found the blog post though yet, so could very well be mistaken. I would recommend trying an aggregation with a filter and see if that gives accurate results.

I wonder if the track_total_hits parameter mentioned here makes a difference. Can yoiu try setting this to true?

kapso · May 31, 2021, 8:42am

Yea I am running the search query with track_total_hits true, you need that for queries that return counts greater than 10,000

Also, /_count query is returning same inaccurate results.

Hmm, ok will try with aggregations.

kapso · June 1, 2021, 8:51am

So apparently, aggregations also return the same inaccurate counts. Here's the aggs query...

{
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "level_2_classification_ids",
        "size": 1,
        "include": [
          261243
        ]
      }
    }
  },
  "size": 0
}

Christian_Dahlqvist · June 1, 2021, 8:57am

How are you deducing that the result is wrong?

kapso · June 1, 2021, 9:12am

Using the above approach.

system · June 29, 2021, 9:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Results returned by elasticsearch are inaccurate Elasticsearch	1	414	July 6, 2017
0 hits, 14693688 total on index with 700 documents Elasticsearch	15	2239	November 7, 2017
Wrong values in _count API in ES 5.3.0? Elasticsearch	4	1073	May 11, 2017
Elastic _count returns incorrect count? Elasticsearch	1	327	December 22, 2020
Index not refreshed. Query count result not accurate Elasticsearch	7	1870	July 6, 2017

Count query is returning inaccurate results

Related topics