How can i speed up getting all document in an index

Hello, I have a question.

In my index, 50,000,000 documents in there.

I want to get all documents by python like below code

    import elasticsearch
    import json
    body_str = {"query": {"match_all": {}}}
    es_client = elasticsearch.Elasticsearch("address")
    doc = es_client.search(index = 'index',body = body_str, request_timeout=60, scroll='1m', size=1000)
    scroll_size = len(doc['hits']['hits'])
    sid = doc['_scroll_id']
    total_cnt = doc['hits']['total']
    while scroll_size > 0:
        doc = es_client.scroll(scroll_id=sid, scroll='10m',request_timeout=60 )
        print(doc)

but it takes too long time to get all documents.

I want to speed up.

The Network bandwidth python node to elasticsearch node is 1Gbit/sec

Can you give me an advice?

Thank you.

I'd recommend reading https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#sliced-scroll

And on the same page, let me show you this part:

Scroll requests have optimizations that make them faster when the sort order is _doc. If you want to iterate over all documents regardless of the order, this is the most efficient option:

GET /_search?scroll=1m
{
  "sort": [
    "_doc"
  ]
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.