Comparison between C*/Spark and ES/Spark concerning data locality


(Cedric H.) #1

Hey guys,

Following up on my question on Github (https://github.com/elastic/elasticsearch-hadoop/pull/819#issuecomment-255216112), it seems that the data locality is back for ES/Spark, which is great.

I don't know enough to get a clear view on how that compares with the Spark-Cassandra connector. Would anyone be able to provide more info on that?

Thanks


(Thomas Decaux) #2

Can you be more precise, you have already a Cassandra cluster with Spark, and you want to know how this is works with elasticsearch?


(Cedric H.) #3

I already used C* with Spark, but I have very limited experience with ES and/or ES with Spark.

So the questions could be

  • is there a good source information of the underlying implementation, how Spark workers will query the ES nodes, etc.

  • I couldn't find performance analysis of the way the architecture would scale in term of number of Spark and ES nodes

Thanks a lot !


(system) #4