Hi all!
Firts of all I would like to say that this is my first post here so forgive me if I make any mistake :).
I am trying to retrieve a large number of documents from an Elasticsearch index, the query to Elasticsearch is a simple exists query. The actual number of documents returned by the query is 71790 so in order to process them correctly I am using the helpers scan
function, helpers.scan(elastic_search,query=doc, index=index,size=1000, scroll='1d')
. What my script does is to retrieve all the documents that contain a certain field, which I call temporal field, in order to assign the value of that temporal field to the non-temporal field of that same document.
Initially I had a scroll
time of 25 minutes but as I was getting this error I decided to increase it to 1 day to see if it was really a problem of the scroll time but I still get this error. Obviously the code doesn't take 1 day to run, when it takes about 50k documents to be parsed this error pops up. Everything I find on the internet is that it is due to the scroll time but I have already seen that this is not so because I have set it to 1 day and still the error pops up after about 5 minutes.
Does anyone have any idea what could be causing this error? If you need more information about the node or whatever, feel free to ask for it, I don't know exactly what information I should add.
Thank you all!!!
The specific error is as follows:
for hit in result:
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/elasticsearch/helpers/actions.py", line 459, in scan
body={"scroll_id": scroll_id, "scroll": scroll}, **scroll_kwargs
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
return func(*args, params=params, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 1307, in scroll
"GET", "/_search/scroll", params=params, body=body
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/elasticsearch/transport.py", line 358, in perform_request
timeout=timeout,
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 257, in perform_request
self._raise_error(response.status, raw_data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 182, in _raise_error
status_code, error_message, additional_info
elasticsearch.exceptions.NotFoundError: NotFoundError(404, u'search_phase_execution_exception', u'No search context found for id [12671416]')