I'm trying to figure out a way to retrieve all the document '_id' (ES
internal _id) from an index, e.g. the index has about 20 million documents.
However, by using the get api, ES will do a paging and only return part of
the data.
Not sure if the bulk api could handle this task, but with the scale of the
index, it's still a heavy query.
Is there anyway I can retrieve against the raw filesystem?
I'm trying to figure out a way to retrieve all the document '_id' (ES
internal _id) from an index, e.g. the index has about 20 million
documents.
However, by using the get api, ES will do a paging and only return part of
the data.
Not sure if the bulk api could handle this task, but with the scale of the
index, it's still a heavy query.
Is there anyway I can retrieve against the raw filesystem?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.