I'm using elasticsearch 5.6.4 and py-elasticsearch client (5,5,2) but the client methods are failing to scroll large result sets (~20k documents). When scan()+scroll() are called, or likewise helpers.scan(), the first results are returned, but once the client begins to scroll, the elastic server returns 'Validation Failed: 1: scrollId is missing'. Using wireshark I can confirm that the client methods are sending a 'scroll_id' parameter and value, and to the correct _search/scroll endpoint, per the REST api scroll pattern. Likewise, scrolling works using the Kibana console (queries are below), so I can confirm that it can work.
Can anyone spot why the python client calls fail to scroll documents? I am new to elasticsearch, but searched far and wide for a way to detect what's causing these failures, but I'm stumped. Thank you!
Code snippet and output are here: https://gist.github.com/niceyeti/c8c9f64b27450d5a9c5d27233beb2c00
These queries work successfully in the Kibana Console, and demonstrate the simple scroll pattern desired of the python client:
(1) GET INDEX_NAME/_search/?scroll=5m
{
"query": {
"match_all": {}
}
}
Then using the @scroll_id param from the returned results:
(2) GET _search/scroll
{
"scroll": "5m",
"scroll_id": "asdf123..."
}