hmm, that would definitely be a nice solution but I am a bit worried about the side effects that might not be immediately obvious to users - people that never finish iterating over the generator would then keep a context alive in elasticsearch indefinitely which would definitely be bad.
Ultimately I think the current approach is good in that it promotes the idea that
scroll is to be used for quick export of data - not keeping a "cursor" open while you perform expensive operation on every document, taking a long time. If that is the case I feel you should use some form of background processing with a queue and a pool of workers anyway.
To improve the user experience would there be a way to keep a tombstone of a search context to provide the user more accurate info? "Your scroll timed out, try increasing your
scroll parameter" would be so much more helpful in this case if the overhead is not too big.
Another option would be the idea of a streaming API to/from elasticsearch where this would be done in the coordinating node (potentially same with
bulk), that sounds to me though like more trouble than it's worth...