How to get all document _id of an elasticsearch index


(shenwno) #1

Hi all,

I'm trying to figure out a way to retrieve all the document '_id' (ES
internal _id) from an index, e.g. the index has about 20 million documents.
However, by using the get api, ES will do a paging and only return part of
the data.
Not sure if the bulk api could handle this task, but with the scale of the
index, it's still a heavy query.
Is there anyway I can retrieve against the raw filesystem?

Thanks for helping
Wei

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b22e2015-f8cc-408a-857a-14a4fbbcf6a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #2

Hello Wei ,

You can scan through all the documents in ES using scan and scroll -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan

Thanks
Vineeth

On Fri, Aug 29, 2014 at 1:22 AM, Wei Shen shenwno@gmail.com wrote:

Hi all,

I'm trying to figure out a way to retrieve all the document '_id' (ES
internal _id) from an index, e.g. the index has about 20 million
documents.
However, by using the get api, ES will do a paging and only return part of
the data.
Not sure if the bulk api could handle this task, but with the scale of the
index, it's still a heavy query.
Is there anyway I can retrieve against the raw filesystem?

Thanks for helping
Wei

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b22e2015-f8cc-408a-857a-14a4fbbcf6a0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b22e2015-f8cc-408a-857a-14a4fbbcf6a0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mYcG8AjCY2gRiRXkmmNByAHK2pJv_ajwOuofyP8c4zgQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3