Search & Scroll using python

Moshe_Hayun · October 31, 2018, 12:34pm

Hi,

I have an issue where I am trying to use the search & scroll mechanism to query large amount of information.
https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.search

From time to time due to connectivity issues I am getting IncompleteRead (expected to read <x> more bytes) followed by MaxRetryError.

Using tcpdump I've noticed that I've received partial response json when this occurs.
I can't avoid the connectivity issues in my organization.

My problem is that when this error occurs, the scroll (from the elastic api) doesn't repeat the same request.
It moves on to the next request and I am losing some of the data.

E.g. I am querying 1 million records in chunks of 1000.
If I had 100 connectivity issues during the period it took, I will receive 900k records only.

I don't want to implement the same thing on my own with retries on the same request.
Is there a way using the api to retry the same request in case it fails?

Thanks a lot.

Moshe_Hayun · November 5, 2018, 12:05pm

Checking again if someone is familiar

system · December 3, 2018, 12:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scroll api issue Elasticsearch	3	615	June 6, 2019
ScanError: scroll only succeeded on X out of X shards (python) Elasticsearch	1	4400	October 3, 2018
Issues with Elasticsearch Scroll API results Elasticsearch	1	710	March 17, 2021
Issues with scan and scroll as well as count API Elasticsearch	5	1878	July 5, 2017
ScanError while scrolling more than 10k docs Elasticsearch	6	379	November 20, 2023

Search & Scroll using python

Related topics