Hi
I am trying to retrieve all records from elastic search index
import elasticsearch
from elasticsearch import Elasticsearch
ret_size = 0
cluster = "dvfarmil"
es = Elasticsearch([{'host': 'YYYYYYY', 'port': 80}])
res = es.search(index='XXXXXXXXXX',
scroll = '50m',
body={"size":10000, "query": {"bool": {"must": [ {"match": {"clustername":"OOOOO"}}]}}})
sid=res['_scroll_id']
while ( ret_size > 0 ):
res = es.scroll (scroll_id=sid,scroll = '50m')
ret_size += len(res['hits']['hits'])
if ( sid == res['_scroll_id'] ):
exit(0)
else:
sid=res['_scroll_id']
scroll_id isn't being changed and I know that the index contain ten thousands of records
please advise
Mahmood
I want to retrieve all docs, I don't how much are they but for sure they are more than 300,000 docs
what the correct way to do it?
what I have done isn't working
I prefer to run this query { "query": { "match_all": {} }, "size": 5000, "from": 0 }
then i will doing loop for next page (by modifiying from clause) for next page.
Anyway, you can check how much the exact total docs you have in the response { "took": 94, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 852705, "max_score": 1,...}} in my case i have 852705 document
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.