Spark Bulk Import Performance Benchmarks

james.baiera · February 17, 2017, 6:32pm

Client nodes as an explicit node type have gone away, but they are still around in the sense of nodes that have "master", "data" and "ingest" features turned off. Every node in Elasticsearch is technically a client node, it's just that when we have the option to target client nodes only, we search for nodes that have no roles in 5.x.

That said, if your cluster is transient with no search load, it might make sense to zero in on the default node targeting, which is directly to datanodes. Would you be able to share your job configurations and cluster layout/index settings here? Writing explicitly to datanodes can sometimes be less advantageous when using more complex settings (like skewed shard/node sizes, or multi-index writing).

Topic		Replies	Views
Tunning ElasticSearch with Spark Elasticsearch	1	382	July 5, 2017
Performance of Spark bulk index to Elasticsearch Elasticsearch es-hadoop	3	2599	September 1, 2017
Spark tuning for Elasticsearch - how to increase Index/Ingest throughput Elasticsearch es-hadoop	3	4507	July 6, 2017
Poor ingest node and indexing performance Elasticsearch	1	1013	April 6, 2017
Bulk write to ES \| best practices Elasticsearch es-hadoop	4	5525	July 6, 2017

Spark Bulk Import Performance Benchmarks

Related topics