Revisiting the "Most efficient way to get all IDs"

Continuing the discussion from Most efficient way to get all ids of a type:

@nik9000 You left a comment stating that if the use case could be modeled as an aggregation, then the column store would be implicitly used which may speed up performance. I was curious of exploring this a bit more.

My use case is that I was to snapshot the set of IDs for a specific search result. These IDs are shared entity identifiers that other services can utilize to do other things. The application using ES is for search/discovery, but these set of IDs can be used as input to other things.

The Scroll API works pretty well. In an extreme case, I can extract ~950k ids in ~19s using a parallel sliced scroll and as low as 700ms for a more reasonable set of 12k ids. But I was curious if there were any other optimizations that come to mind. I have tried different combinations of slice size, replicas, and max slices.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.