Skipped records with Search/SearchAfter query

kuhu.88 · September 12, 2022, 4:04pm

Hello folks,

Using a non-unique key for sort (in this example - timestamp) can skip over records in a subsequent request when the 10,000 is reached on a previous run and timestamps carry on beyond that cutoff.

Take the example that for one of the results, half of the results are for timestamp=t1 and half are with timestamp=t2. The next 10,000 records contains some (say 500 entries) with timestamp=t2. Based on the current implementation, the value of the sort array for the next query would be t2. Elasticsearch in the next response will skip over the 500 entries in the next chunk. This can cause much less results coming through compared to the expected count. The expected counts based on the _count API in ES and the results of scroll which match. The search query return an order of magnitude less for some of the test runs.

An example query --

curl -X GET "localhost:9200/indexname*/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 10000,
  "_source": ["abc.123", "abc.124"],
  "query": {
      "range": {
      "@timestamp": {
        "gte": "now-1d/d",
        "lt": "now/d"
      }
    }
  },
  "sort": [
    {"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}}
  ]
}
'

The next query would be something like:

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 10000,
  "query": {
      "range": {
      "@timestamp": {
        "gte": "now-4h/h",
        "lt": "now/h"
      }
    }
  },
  "sort": [
    {"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}}
  ]

 "search_after": [                                
   "2022-06-22T03:00:28.000Z"
 ]
}
'

Using PIT does not mitigate this issue of needing to go through every record that matches as it is used only as a tie-breaker and timestamp seems to win which means the skipping will occur. Appreciate any ideas and thoughts on next steps.

ES version: 7.16.2

kuhu.88 · October 4, 2022, 9:54pm

Any updates on this?

system · November 1, 2022, 9:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch - search_after pagination sort Elasticsearch	5	1096	June 26, 2023
Using search_after. Getting skipped count and rest count Elasticsearch	3	804	September 26, 2019
ES Query to sort data on Timestamp Elasticsearch	2	7635	April 10, 2017
Add more sort on specific fields Elasticsearch	6	1269	January 20, 2019
Sorting search results Elasticsearch	1	270	July 6, 2017

Skipped records with Search/SearchAfter query

Related topics