Speeding up deep pagination for large ids query

I've got an Elasticsearch index with ~100M documents, and typically need to search within a subset of them using an IDs query combined with other search terms/filters.

These subsets of IDs are dynamic and come from an external system, so aren't something we can keep indexed in Elasticsearch.

When performing pagination using the Point in Time functionality, fetching each subsequent set of results (using size=10000 and search_after) gets extremely slow when using a large list of IDs.

For example, using a list of ~200,000 IDs - fetching each set of 10,000 results using PIT and search_after takes ~20 seconds. So fetching 100,000 documents this way takes over 3 minutes.

Are there any good ways to speed this up? I'd have expected the initial query containing such a large number of IDs to be slow, but it appears that the cost for using such a larger number of IDs is paid every time the next set of results if fetched, rather than just once.

For info, my search looks something like:

{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {"query": "some query"}
        },
        {
          "ids": {"values": [... ~200k integer IDs ...]}
        }
      ]
    }
  }
  "_source": False,
  "fields": ["title"],
  "sort": "_score",
  "size": 10000,
  "track_total_hits": true,
  "track_scores": false,
  "pit": {
    "id":  "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", 
    "keep_alive": "1m"
  },
  "sort": "title",
  "search_after": [...]                       
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.