Best way to write from Apache Spark to ECK

Keith_Massey · September 7, 2023, 10:25pm

Hi @krezno. I am not aware of any off-the-shelf way to write from spark to logstash. It might be worth trying out using es-hadoop to write to your ECK cluster (with es.nodes.wan.only set to true) to get a sense for how good or bad the performance really is though.

I'm not really familiar with ECK and it's been a while since i used kubernetes, but I assume the problem is that Elasticsearch is exposed as a service at a single URL, and discovery does you no good since none of the discovered nodes are accessible. Is that right? Or are you running into other problems?

If you do try es-hadoop, keep in mind that it will see your whole Elasticsearch cluster as a single node, so your hadoop or spark jobs will fail if it gets "blacklisted" due to failures writing to it. We have this problem with customers using load balancers as well. You might want to list the same node several times as I described here.

Topic		Replies	Views
Ingesting data from HDFS to ElasticSearch Elasticsearch	3	3737	February 15, 2017
Spark Connector performance issue [thread contd] Elasticsearch es-hadoop	3	972	May 8, 2018
How to read/write to Elasticsearch with Apache Spark with scala Elasticsearch	3	741	November 28, 2018
Throttle the ES-Hadoop write speed Elasticsearch es-hadoop	3	631	September 29, 2020
[HADOOP] Anyone used TransportClient for writing to ES from Hadoop mappers? Elasticsearch	3	432	July 6, 2017

Best way to write from Apache Spark to ECK

Related topics