My code here essentially is
for hit in search.scan():
do something
I believe it will use pagination/scrolling on this, and it seems that after a while, it will have this error message:
17:23:34 for hit in search.scan():
17:23:34 File "/usr/lib/python2.7/site-packages/elasticsearch_dsl/search.py", line 719, in scan 17:23:34 **self._params
17:23:34 File "/usr/lib/python2.7/site-packages/elasticsearch/helpers/actions.py", line 469, in scan
17:23:34 body={"scroll_id": scroll_id, "scroll": scroll}, **scroll_kwargs
17:23:34 File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
17:23:34 return func(*args, params=params, **kwargs)
17:23:34 File "/usr/lib/python2.7/site-packages/elasticsearch/client/init.py", line 1395, in scroll
17:23:34 "GET", "/_search/scroll", params=params, body=body
17:23:34 File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 358, in perform_request
17:23:34 timeout=timeout,
17:23:34 File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 261, in perform_request
17:23:34 self._raise_error(response.status, raw_data)
17:23:34 File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 182, in _raise_error
17:23:34 status_code, error_message, additional_info
17:23:34 elasticsearch.exceptions.NotFoundError: NotFoundError(404, u'search_phase_execution_exception', u'No search context found for id [19079999]')
I've read that this is because it took too long for the scroll to respond, and that I would have to calibrate it according to how my cluster is set up (currently, a single cluster with a single node. Unfortunately, cannot upgrade it yet).
How do I calibrate it? What do I look for to ensure good performance? Any suggestion is appreciated.