First of all I'm pretty new to ELK so sorry if my questions seem stupid
I inherited a crashed elasticsearch. It was a basic/default elk configuration running on one server and used for demo purposes. There are 3 devices that send about 1MB of json data each per day. ELK was used because of the kibana dashboards. It was configured with the default 5 primary shards and 1 replica and logstash conf was creating one index per day per device.
I think it ran for about 6 months and then it crashed with out of memory: java.lang.OutOfMemoryError: Java heap space That server only has 4gb. I tried to restart it, but i can't. If i look at the cluster health it starts with around 12000 unassigned shards. It slowly makes them into active shards, but around 6000 it crashes with out of memory. I couldn't find any solution on the web to restore/restart it.
My first question is can i do anything to recover that data? I can't add more memory to the server so that it completes the shard activation process.
My second question is about configuring a new server for elk. Still for demo, so not much activity, but there will be around 30-40 devices sending around 1MB per day. The new server has 16GB.
After reading a lot on the forum it seems to me that saving the data first in an ACID db is very recommended. Is this correct?
Then, how should i configure the new elasticsearch. As this is only one server is the following correct: 1 node, 1 active shard, 1 replica ?
And lastly, is it ok to still have 1 index per day per device or is it better to have just 1 index per device?
Oh boy that's a lot You need to limit how many of these shards are all open at once.
I would start by disabling allocation and then closing any indices you don't need. Then enable allocation again and let it assign all the shards of the indices that you left open.
Next, prevent it from creating too many new shards. Here is an article about shard sizing:
If you are using time-based indices then consider longer time periods (e.g. weekly or monthly rather than daily). Also consider reducing the default number of shards from 5, possibly to 1, in your index templates.
Then you can work through the closed indices in small batches, opening a few, perhaps reindexing them into fewer, larger, indices and then deleting the original shards.
This is not really the case any more. If you're truly paranoid, maybe keep a copy of the incoming data until it's gone into Elasticsearch and you've taken a snapshot. Recent versions of Elasticsearch are pretty resilient.
You probably want 0 replicas: the primary doesn't count as a replica, so if you've only one node then you've only got room for a primary.
Can you put everything from all the devices into 1 index? Lots of tiny shards will cause issues like the heap pressure you're seeing. Certainly 1 per device per day (1MB per shard) is too many.
Just wanted to thank you again very much for the help. I've managed to do everything. I followed your steps and I recovered the data slowly with snapshots, moved it into the new server and reindexed everything into only 1 yearly index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.