Comparison between C*/Spark and ES/Spark concerning data locality

Hey guys,

Following up on my question on Github (https://github.com/elastic/elasticsearch-hadoop/pull/819#issuecomment-255216112), it seems that the data locality is back for ES/Spark, which is great.

I don't know enough to get a clear view on how that compares with the Spark-Cassandra connector. Would anyone be able to provide more info on that?

Thanks

Can you be more precise, you have already a Cassandra cluster with Spark, and you want to know how this is works with elasticsearch?

I already used C* with Spark, but I have very limited experience with ES and/or ES with Spark.

So the questions could be

  • is there a good source information of the underlying implementation, how Spark workers will query the ES nodes, etc.

  • I couldn't find performance analysis of the way the architecture would scale in term of number of Spark and ES nodes

Thanks a lot !