Relationship between Spark tasks and batch size

jspooner · July 24, 2016, 3:20pm

A quote from Performance considerations | Elasticsearch for Apache Hadoop [8.11] | Elastic

If this takes more than 1-2s to be processed, there’s no need to decrease it. If it’s less then that, you can try increasing it in small steps.

How do I see how long each POST takes to respond? In my hadoop logs I've noticed error messages like "Maybe Elasticsearch is overloaded?" but I'm tailing the elaticsearch logs and they remain completely empty. I have marvel running and I see short spikes in CPU and JVM memory but nothing alarming.
Can you confirm that 'B' the configuration in bytes is my sparkConf config? ex: conf.set("es.batch.size.bytes", "15mb")

with a configuration of B bytes

It says you can monitor rejections in Marvel but I don't see that metric? How do I find this?
Answer from Jun22: Seeing Indexing Rejections

In such a scenario, monitor Elasticsearch (through Marvel or other plugins) and keep an eye on bulk processing. Look at the percentage of documents being rejected; it is perfectly fine to have some documents rejected but anything higher then 10-15% on a regular basis is a good indication the cluster is overloaded.

Topic		Replies	Views
Spark Reading/Writing Performance Elasticsearch es-hadoop	2	1585	July 6, 2017
Spark + Elastic search write performance issue Elasticsearch es-hadoop	2	2501	November 28, 2017
Bulk write to ES \| best practices Elasticsearch es-hadoop	4	5525	July 6, 2017
Performance degradation when writing to AWS elasticsearch using elasticsearch-hadoop library Elasticsearch es-hadoop	6	2043	July 6, 2017
ES - Spark tuning for bulk writes Elasticsearch es-hadoop	17	2710	January 24, 2021

Relationship between Spark tasks and batch size

Related topics