Hello,
I'm working with ElasticSearch using the Spark to ElasticSearch. Indexing batches of documents each 10 seconds (about 200K-300K document in 10 seconds).
The problem is that if I have some peaks and I have to index 500K documents in 10 seconds, it's not fast enough and I get more and more delay.
I have two ES nodes with 8 cores, 32GB, 2HD 500GB and I have configured 28GB to the VM.
I have until seven indices with 5 shards and one replica (default configuration). I have created an script to store just the last seven indices.
I have inserted document with 6 fields, analyzing all of them, fields are small.
I have been checking CPU, memory, IO of the ES nodes and Spark. It seems that Spark doesn't have too much to do the most of time so I guess that it's all about ES.
I don't know if I could tune ES on some way. I have disabled replication to see the behavior. I guessed that performance should be much better, but I didn't see to improve it a lot.
Should I tried with less shards? although I think that we are going up the ES nodes.
Any advice? does it seems like 20K document per second good enough? They are log (log4j traces) of servers and some extra metadata.