Revisiting the "Most efficient way to get all IDs"

bruth · June 11, 2019, 11:03am

Continuing the discussion from Most efficient way to get all ids of a type:

@nik9000 You left a comment stating that if the use case could be modeled as an aggregation, then the column store would be implicitly used which may speed up performance. I was curious of exploring this a bit more.

My use case is that I was to snapshot the set of IDs for a specific search result. These IDs are shared entity identifiers that other services can utilize to do other things. The application using ES is for search/discovery, but these set of IDs can be used as input to other things.

The Scroll API works pretty well. In an extreme case, I can extract ~950k ids in ~19s using a parallel sliced scroll and as low as 700ms for a more reasonable set of 12k ids. But I was curious if there were any other optimizations that come to mind. I have tried different combinations of slice size, replicas, and max slices.

system · July 9, 2019, 11:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Most efficient way to get all ids of a type Elasticsearch	3	4548	July 5, 2017
Get all 3 million ids of a type very quickly Elasticsearch	2	1307	December 7, 2017
Extracting fields in bulk - using ES as a data store Elasticsearch	4	553	July 6, 2017
What's the quickest way to extract a LARGE amount of records out of ES? Best practices for scroll API are welcome Elasticsearch	2	3040	July 5, 2017
Scan and scroll performance with IDs query Elasticsearch	6	3444	July 5, 2017

Revisiting the "Most efficient way to get all IDs"

Related topics