Advice for restore/configuration

Hello,

First of all I'm pretty new to ELK so sorry if my questions seem stupid :slight_smile:

I inherited a crashed elasticsearch. It was a basic/default elk configuration running on one server and used for demo purposes. There are 3 devices that send about 1MB of json data each per day. ELK was used because of the kibana dashboards. It was configured with the default 5 primary shards and 1 replica and logstash conf was creating one index per day per device.

I think it ran for about 6 months and then it crashed with out of memory: java.lang.OutOfMemoryError: Java heap space That server only has 4gb. I tried to restart it, but i can't. If i look at the cluster health it starts with around 12000 unassigned shards. It slowly makes them into active shards, but around 6000 it crashes with out of memory. I couldn't find any solution on the web to restore/restart it.

My first question is can i do anything to recover that data? I can't add more memory to the server so that it completes the shard activation process.

My second question is about configuring a new server for elk. Still for demo, so not much activity, but there will be around 30-40 devices sending around 1MB per day. The new server has 16GB.

After reading a lot on the forum it seems to me that saving the data first in an ACID db is very recommended. Is this correct?

Then, how should i configure the new elasticsearch. As this is only one server is the following correct: 1 node, 1 active shard, 1 replica ?

And lastly, is it ok to still have 1 index per day per device or is it better to have just 1 index per device?

Thank you

Oh boy that's a lot :slight_smile: You need to limit how many of these shards are all open at once.

I would start by disabling allocation and then closing any indices you don't need. Then enable allocation again and let it assign all the shards of the indices that you left open.

Next, prevent it from creating too many new shards. Here is an article about shard sizing:

If you are using time-based indices then consider longer time periods (e.g. weekly or monthly rather than daily). Also consider reducing the default number of shards from 5, possibly to 1, in your index templates.

Then you can work through the closed indices in small batches, opening a few, perhaps reindexing them into fewer, larger, indices and then deleting the original shards.

2 Likes

This is not really the case any more. If you're truly paranoid, maybe keep a copy of the incoming data until it's gone into Elasticsearch and you've taken a snapshot. Recent versions of Elasticsearch are pretty resilient.

You probably want 0 replicas: the primary doesn't count as a replica, so if you've only one node then you've only got room for a primary.

Can you put everything from all the devices into 1 index? Lots of tiny shards will cause issues like the heap pressure you're seeing. Certainly 1 per device per day (1MB per shard) is too many.

1 Like

Thank you for the answers. I will try what you said about recovery right away.

Nice with the snapshots. I missed the fact that they are incremental.

Ok for 0 replicas.

I think it's ok for only one index, too. Probably one index per year is ok for the demo traffic.

Thank you

1 Like

Hi David,

Just wanted to thank you again very much for the help. I've managed to do everything. I followed your steps and I recovered the data slowly with snapshots, moved it into the new server and reindexed everything into only 1 yearly index.

Best,
Tudor

2 Likes

You're welcome @tudro, and thanks for letting us know. I hope everything runs smoothly for you from now on.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.