Why do my scroll operations fail?


(Peter Trei) #1

Me again - I fixed my speed problem by moving to another system - I think the one I was on had flakey HW. But I'm still having problems with my small four node cluster; reading records keeps crahsing.

I'm using elasticsearch-py to read logstash-* indexes, and generate secondary data. My indicies have up to 60 million records each. To read them, I'm using e-py's 'scan' helper:

import elasticsearch
import elasticsearch.helpers as helpers

es = elasticsearch.Elasticsearch(retry_on_timeout=True)
# sets up global ES handle

#main processing loop 
def process_index(index_name):
  global es
  query_body = '{"size": 10000, "query": {"match_all":{}}}' 
scanResp = helpers.scan(client=es,query=query_body,scroll="5m",index=index_name,timeout="5m")
  resp={}
  for resp in scanResp:
     DO STUFF FOR ONE RECORD

(yes, you saw this before)

My problem is that the reading fails after a while. I'm trying to understand if my code is wrong, my setup is wrong, or if its simply that my machines are underpowered:

Here's the crash, lightly redacted

Traceback (most recent call last):

File "pegasus5.py", line 580, in
mymain()
File "pegasus5.py", line 542, in mymain
process_index(indexname)
File "pegasus5.py", line 414, in process_index
for resp in scanResp:
File "build/bdist.linux-x86_64/egg/elasticsearch/helpers/init.py", line 230, in scan
File "build/bdist.linux-x86_64/egg/elasticsearch/client/utils.py", line 68, in _wrapped
File "build/bdist.linux-x86_64/egg/elasticsearch/client/init.py", line 616, in scroll
File "build/bdist.linux-x86_64/egg/elasticsearch/transport.py", line 308, in perform_request
File "build/bdist.linux-x86_64/egg/elasticsearch/connection/http_urllib3.py", line 86, in perform_request
File "build/bdist.linux-x86_64/egg/elasticsearch/connection/base.py", line 102, in _raise_error
elasticsearch.exceptions.NotFoundError: TransportError(404, u'{"_scroll_id":"c2NhbjswOzE7dG90YWxfaGl0czo0MDc1OTc4NDs=","took":709,"timed_out":false,"_shards":{"total":5,"successful":0,"failed":4,"failures":[{"status":404,
"reason":"RemoteTransportException[[pegasus_101][inet[/XXX.XXX.XXX.101:9300]][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [258]]; "},{"status":404,
"reason":"RemoteTransportException[[pegasus-109][inet[/XXX.XXX.XXX.109:9300]][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [263]]; "},{"status":404,
"reason":"RemoteTransportException[[pegasus_101][inet[/XXX.XXX.XXX.101:9300]][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [257]]; "},{"status":404,
"reason":"SearchContextMissingException[No search context found for id [325]]"}]},"hits":{"total":40759784,"max_score":0.0,"hits":[]}}')

This program was running on node XXX.XXX.XXX.108

What is the reason behind a 'SearchContextMissingException? Are my systems timing out?

thanks
Peter Trei


(system) #2