Search_after query doesn't return correct results for pagination

I have a process that queries the watcher-history indices for a specified time, and the amount of results is over 10000 docs, which is the limit. So I tried to use the search_after query, after adding a sort value, but the results are incorrect.

The following query example would return total of 385 docs, so two queries use a size of 200 and a sort field, and the 2nd query also uses the search_after referencing the last field in the first query's return dataset. My problem is both queries return 200 records. Why is this and what am I doing wrong? (NOTE: my search_after was at first the result.execution_time by itself, then I added the _id of the watcher, then I added the sort value, etc.)

1st query:
GET .watcher-history*/_search

{
	"size": 200,
	"_source": [
		"result.execution_time",
		"result.condition.met",
		"result.condition.status",
    "result.actions",
		"metadata.name",
    "state"
	],
	"query": {
	  "bool": {
		"must": [
      {
        "query_string": {
          "query": "metadata.name: Digital_*",
          "analyze_wildcard": true,
          "default_field": "*"
        }
      },
			{
				"range": {
					"result.execution_time": {
						"gte": "now-12m"
					}
				}
			}
		]
	  }
	},
	"sort": [ 
		{"result.execution_time": {"order": "asc"}},
		{"_id": "asc"}
	]
}

2nd query:

GET .watcher-history*/_search
{
	"size": 200,
	"_source": [
		"result.execution_time",
		"result.condition.met",
		"result.condition.status",
    "result.actions",
		"metadata.name",
    "state"
	],
	"query": {
	  "bool": {
		"must": [
      {
        "query_string": {
          "query": "metadata.name: Digital_*",
          "analyze_wildcard": true,
          "default_field": "*"
        }
      },
			{
				"range": {
					"result.execution_time": {
						"gte": "now-12m"
					}
				}
			}
		]
	  }
	},
	"sort": [ 
		{"result.execution_time": {"order": "asc"}},
		{"_id": "asc"}
	],
	"search_after": [                                
		1629395698152,
		"Digital_dev_httpsctp-gateway-stgcodebig2net_alternate_11ee33da-42c8-40ac-8f53-04ce42c9692a-2021-08-19T17:54:58.152683Z"
	]
}

Is it possible, that the watcher history gets updated between those two calls because of watches running? While search_after optimizes to require less memory, it is not using a point-in-time snapshot of the data.

You may want to take a look at scroll search or if you have a newer Elasticsearch version go with Point in Time readers for this task.

1 Like

I can't use Point-in-time because the watcher-history doesn't have an alias, and the PIT call requires an alias. I can't use a specific watcher-index name, because this call is dynamic (the watcher-history name is a moving target).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.