Last night during some document processing, a MASSIVE query was run that would have returned something like 30GB worth of data. As a result, the nodes started to die.
Each machine has 30GB of JVM and 60GB total of RAM.
Is there a way to prevent Elasticsearch from killing itself? If it is working on a query that is going to cause an OOM exception how can I get it to abandon the query instead of committing suicide?
I do, the odd thing is that our cluster (its kind of new) has been running without any indication of failure or strain for 2 weeks, then all of the sudden it failed during that query and has been much less stable ever since.
We have added 2 more data nodes, and have had very little improvement. I thought it was the query that was causing the damage, but now I am just lost.
I will certainly consider it, but it seems odd that the cluster was fully operational and operating well within reasonable limits, for several weeks (in pre production state) and 2 weeks in full production state. There hasnt been any changes, but it suddenly started failing.
Also one more question if I can snag you while youre still here
Is there a "rebalance shards" API I can trigger? I have looked through quite a few of the shard allocation api endpoints / settings, but it doesnt seem that there is a "Balance" type command. I added a few nodes, and they did not take up and proportional amount of shards. so 90% of the shards are sitting on the first 3 data nodes (that were the original 3 nodes)
Well. I would not touch at the default settings. Unless you don't have enough disk space on a specific node or using specific allocation filtering, everything should be nicely balanced.
Well.. let me correct myself... We index a TON of documents per day. nearly 3m a day.
But we have not created any new indexes. So I guess our document count has changed, but 3million is a drop in the bucket compared to our overall dataset.
What kind of data is that?
3m per days might be enough to add some pressure on your nodes. Specifically if you are using fielddata instead of doc values and are aggregating tons of different values.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.