The existing Apache Storm EsSpout is designed to use a scroll to search through ES once and then exit. It appears to do this very well and quite reliably.
However, in my application I'd like to be able to repeatedly run the query and streams the results. If there are no documents the Spout should sit idle for a configurable amount of time before checking again.
Does anyone have any advice they can share on extending EsSpout for this functionality? Before diving in I wanted to ask if anyone had explore this extension before and where the pitfalls may lie. Thanks.
This has been a suggested feature for the connector but isn't on the road map for anytime soon. You could add feedback to that issue. Additionally, PR's are always welcome if it's something that interests you.
Wow that is an old request. Interesting that its coming from jnioche. I believe he is behind StormCrawler, which makes complete sense why he wanted to see that. I'll have a look and see how involved the changes look.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.