Segmentation fault when scrolling in ES Python

Hi There,

I am using ES Python API to calculate some stuff based on the data I stored in my ES cluster. For my calculation I need to call all documents which satisfy certain conditions and get certain information from them. Therefore I am doing a scroll with a size of 1000 and a duration of 1 second. I have written a Python script which uses ES-Python to do the job for me.

However, always after a little more than 1400 scrolls the script quits with the error "Segmentation fault (core dumped)". I have tried to increase the scroll size to 10000 instead but still the same problem happens. Following is the part of the script where I'm doing the scrolling:

page = Elasticsearch().search(index = my_index, scroll = "1s", size = 1000, body = { "_source" : [ "_id", "@timestamp", my_field], "query" : {"bool":{"must" : [{"exists":{ "field" : my_field }},{"exists":{ "field" : "@timestamp" }}]}}})
sid = page['_scroll_id']
scroll_size = page['hits']['total']
while (scroll_size > 0):
    print "Scrolling..."
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    print "scroll size: " + str(scroll_size)
    page = Elasticsearch().scroll(scroll_id = sid, scroll = '1s')
    # Update the scroll ID
    sid = page['_scroll_id']

I could find out that the line page = Elasticsearch().scroll(scroll_id = sid, scroll = '1s') is responsible for the error. I have checked the scroll id and it is always the same (at least until the error is thrown).
Has anybody faced a similar problem or does anybody know how this can be solved?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.