Tunning ElasticSearch with Spark

Guillermo_Ortiz · March 15, 2016, 5:18pm

Hello,

I'm working with ElasticSearch using the Spark to ElasticSearch. Indexing batches of documents each 10 seconds (about 200K-300K document in 10 seconds).
The problem is that if I have some peaks and I have to index 500K documents in 10 seconds, it's not fast enough and I get more and more delay.

I have two ES nodes with 8 cores, 32GB, 2HD 500GB and I have configured 28GB to the VM.
I have until seven indices with 5 shards and one replica (default configuration). I have created an script to store just the last seven indices.
I have inserted document with 6 fields, analyzing all of them, fields are small.

I have been checking CPU, memory, IO of the ES nodes and Spark. It seems that Spark doesn't have too much to do the most of time so I guess that it's all about ES.

I don't know if I could tune ES on some way. I have disabled replication to see the behavior. I guessed that performance should be much better, but I didn't see to improve it a lot.
Should I tried with less shards? although I think that we are going up the ES nodes.
Any advice? does it seems like 20K document per second good enough? They are log (log4j traces) of servers and some extra metadata.

Topic		Replies	Views
Spark tuning for Elasticsearch - how to increase Index/Ingest throughput Elasticsearch es-hadoop	3	4548	July 6, 2017
Performance of Spark bulk index to Elasticsearch Elasticsearch es-hadoop	3	2607	September 1, 2017
Spark Bulk Import Performance Benchmarks Elasticsearch es-hadoop	9	3387	April 28, 2017
Performance Challenge Elasticsearch es-hadoop	6	1082	April 28, 2017
Slow Performance of Elastic Search with Spark Elasticsearch es-hadoop	4	1557	July 29, 2021

Tunning ElasticSearch with Spark

Related topics