Best way to create a list of all _ids in an index (Up to date version)

So what's the current best way to query (and therefore dump to say a file) all the document _ids in an index?

I have a 218,000 doc database with about 8000 documents missing (according to the document counts). To investigate I need a list of docs in the index to compare with those in my MariaDB to find the missing docs and try to see why they weren't indexed. I therefore just need to stream and dump to a file ready for import to Maria a list of IDs. A simple JOIN will do the rest.

There are older threads on this topic, but it seems much mentioned in the replies have been deprecated. Currently I am on 17.8

Sooo Just dump all the _ids of all documents into a file - in 200k+ docs efficient way. No other fields required.

Using a point in time search or scroll search in combination with a _doc sorting (as you don't care about the order) might be a good idea, see Sort search results | Elasticsearch Guide [7.14] | Elastic

Scrolling through a 218k dataset should not take too much time, so maybe no need to start optimizing but just measuring the runtime before going more fancy :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.