Elasticsearch is not really capable of streaming data the same way as Kafka does. The es-hadoop connector really just makes it possible to ship bulk data between Elasticsearch and other Hadoop ecosystem technologies. Because of this, it is unwise to expect that reading from Elasticsearch using Spark Streaming or Storm will produce the same effects as reading an event stream from Kafka. The connector uses the scroll api with an optionally provided query. Once it completes reading all of the data returned from the scroll, the data source is exhausted and the spout will idle.
In your case, it seems that you are looking for a kind of lambda style architecture. A possible way to approach this is to stream data out of Kafka into two places: Elasticsearch for serving up the raw data, and into an ML pipeline for creating enriched data. The enriched data could also then be pushed into Elasticsearch for serving to whatever applications or dashboards you may require. In the event that you need to perform a bulk rebuild of the enriched data, or if you must retrain your machine learning model, you have the entire raw data corpus available in Elasticsearch.
Since you mention using streaming tools for machine learning, I assume you are looking to leverage model-based real time analysis on data. Depending on the machine learning approach you're looking to execute, your mileage may vary with this strategy.
Hope this helps. As always with architectural advice, take it with a grain of salt!
P.S. - Cross posting using SO is probably not the best way to interact with the forums. I would replicate the original question here (even if it's without pictures!) so that the entire conversation is available in one thread. Thanks!