Add Es Spark Accumulators

glegoux · November 14, 2023, 2:03pm

Hello ,

These metrics will be very useful to monitor a Spark application using the extension Elasticsearch for Hadoop.

What do you think about it?

Keith_Massey · November 21, 2023, 1:57pm

Thanks for the PR! That seems like a good idea, assuming you're not updating the accumulators inside of a transformation (I haven't looked at the code yet) since that could double-count things. The whole team has been incredibly busy the last few weeks, but we definitely plan on taking a look at this.

glegoux · November 21, 2023, 9:30pm

Happy to know that this feature seems also useful to the core team!

Yes, with the lazy behavior of Spark, the attempts of Spark tasks and the retries of embedded Elasticsearch client provoke that the counters can be increased several times for the same batch or Spark task. But it makes sense for each metric because you monitor the interactions' client / server, where the communications that can be redundant to export / import a same portion of data, or where the code is executed several times because the RDD/DataFrame/Dataset is not persisted in cache.

The export from Elasticsearch (reads) is a Spark transformation, and the import to Elasticsearch (writes) is a Spark action.

I wait for with haste your code review and feedbacks to adapt and improve the code!

system · December 19, 2023, 9:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Access es-hadoop stats from Spark Elasticsearch es-hadoop	10	2278	July 6, 2017
ES Aggregations in Spark Elasticsearch es-hadoop	2	2170	July 6, 2017
Support for Aggregations in Spark Elasticsearch es-hadoop	2	1182	June 24, 2019
Tunning ElasticSearch with Spark Elasticsearch	1	382	July 5, 2017
Bulk Operation Results from Databricks Spark Job Elasticsearch	3	471	May 30, 2019

Add Es Spark Accumulators

Related topics