We're having some performance/stability problems in our cluster while indexing data. There is especially two fields with pretty large html content with a custom analyzer as below.
The contents in those fields are around 1Mb - 3Mb large.
What we're seeing is nodes dropping out of the cluster frequently while adding docs. Logs show longish garbage collections.. The cluster is 5 nodes of 31Gb heap.
Any suggestions to make this easier on the cluster? I don't mind it being slow, but instability i want to avoid.
What version of Elasticsearch are you using? It's possible that the indexing is causing memory problems, but far more likely that it's your queries/aggregations.
How many documents are you sending per-bulk? Do you constrain the bulk size so that it doesn't go over n mb-per-bulk?
How many concurrent processes/threads are sending bulk requests
Ah, I see. I'd try lowering the batch size of Reindex, the default is 1000. If your docs are 1-3mb, you could be hitting your cluster with 1-3gb bulk requests, which will definitely make the heap unhappy (it has to buffer up that entire request in newgen memory before parsing and sending to various shards).
Try setting it something like 50 to start, and work up from there:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.