Speeding up deep pagination for large ids query

dsc · January 23, 2024, 2:45pm

I've got an Elasticsearch index with ~100M documents, and typically need to search within a subset of them using an IDs query combined with other search terms/filters.

These subsets of IDs are dynamic and come from an external system, so aren't something we can keep indexed in Elasticsearch.

When performing pagination using the Point in Time functionality, fetching each subsequent set of results (using size=10000 and search_after) gets extremely slow when using a large list of IDs.

For example, using a list of ~200,000 IDs - fetching each set of 10,000 results using PIT and search_after takes ~20 seconds. So fetching 100,000 documents this way takes over 3 minutes.

Are there any good ways to speed this up? I'd have expected the initial query containing such a large number of IDs to be slow, but it appears that the cost for using such a larger number of IDs is paid every time the next set of results if fetched, rather than just once.

For info, my search looks something like:

{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {"query": "some query"}
        },
        {
          "ids": {"values": [... ~200k integer IDs ...]}
        }
      ]
    }
  }
  "_source": False,
  "fields": ["title"],
  "sort": "_score",
  "size": 10000,
  "track_total_hits": true,
  "track_scores": false,
  "pit": {
    "id":  "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", 
    "keep_alive": "1m"
  },
  "sort": "title",
  "search_after": [...]                       
}

system · February 20, 2024, 2:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Get all 3 million ids of a type very quickly Elasticsearch	2	1322	December 7, 2017
Search 1M data in elasticsearch using pagination Elasticsearch	3	1192	July 30, 2017
Scan and scroll performance with IDs query Elasticsearch	6	3478	July 5, 2017
Elastic search response capped to 10k records Elasticsearch	5	448	October 12, 2022
Performance when fetching ids for large result set Elasticsearch	3	519	July 5, 2017

Speeding up deep pagination for large ids query

Related topics