Hi There,
I am using ES Python API to calculate some stuff based on the data I stored in my ES cluster. For my calculation I need to call all documents which satisfy certain conditions and get certain information from them. Therefore I am doing a scroll with a size of 1000 and a duration of 1 second. I have written a Python script which uses ES-Python to do the job for me.
However, always after a little more than 1400 scrolls the script quits with the error "Segmentation fault (core dumped)
". I have tried to increase the scroll size to 10000 instead but still the same problem happens. Following is the part of the script where I'm doing the scrolling:
page = Elasticsearch().search(index = my_index, scroll = "1s", size = 1000, body = { "_source" : [ "_id", "@timestamp", my_field], "query" : {"bool":{"must" : [{"exists":{ "field" : my_field }},{"exists":{ "field" : "@timestamp" }}]}}})
sid = page['_scroll_id']
scroll_size = page['hits']['total']
while (scroll_size > 0):
print "Scrolling..."
# Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
print "scroll size: " + str(scroll_size)
page = Elasticsearch().scroll(scroll_id = sid, scroll = '1s')
# Update the scroll ID
sid = page['_scroll_id']
I could find out that the line page = Elasticsearch().scroll(scroll_id = sid, scroll = '1s')
is responsible for the error. I have checked the scroll id and it is always the same (at least until the error is thrown).
Has anybody faced a similar problem or does anybody know how this can be solved?