Comparison between C*/Spark and ES/Spark concerning data locality

chernals · October 21, 2016, 9:54am

Hey guys,

Following up on my question on Github (https://github.com/elastic/elasticsearch-hadoop/pull/819#issuecomment-255216112), it seems that the data locality is back for ES/Spark, which is great.

I don't know enough to get a clear view on how that compares with the Spark-Cassandra connector. Would anyone be able to provide more info on that?

Thanks

ebuildy · October 24, 2016, 6:50pm

Can you be more precise, you have already a Cassandra cluster with Spark, and you want to know how this is works with elasticsearch?

chernals · October 26, 2016, 10:37pm

I already used C* with Spark, but I have very limited experience with ES and/or ES with Spark.

So the questions could be

is there a good source information of the underlying implementation, how Spark workers will query the ES nodes, etc.
I couldn't find performance analysis of the way the architecture would scale in term of number of Spark and ES nodes

Thanks a lot !

Topic		Replies	Views
Elasticsearch-Hadoop Data Locality Elasticsearch	2	944	July 6, 2017
Spark/ES on kubernetes, co-location ok? Elasticsearch es-hadoop	1	413	December 23, 2021
Is elasticsearch-spark reading from localhost if ES and Spark is running on the same node? Elasticsearch es-hadoop	2	1198	July 6, 2017
Tunning ElasticSearch with Spark Elasticsearch	1	384	July 5, 2017
ESHadoop - Hadoop vs Spark Elasticsearch es-hadoop	3	1230	July 6, 2017