How to upgrade my ElasticSearch

Hey gus, hope everybody is doing good!

I'm having a problem to upgrade my ES cluster and i'm really lost on how many nodes do i need, their role, their hardware. Right now, i have 2 nodes which one of them is Master Data and Ingestion and the other one is only Data. Each machine has a total of 500 GB of disk, 32 GB of RAM and 16 GB o heap size.
The help that i'm need for is how to upgrade it, because i'm using graylog and storing 1 month of logs in total (using 18 index with 2 shards each and no replicas) consuming 600 GB of disk space with 1,200,000,000 logs in total.
What i need is to know how many nodes should i have so i can expand this one moth of logs to 6 months, i was thinking about 3 master nodes, 3 ingestions nodes and 6 data nodes, but i have no idea of the hardware they would need and if this is the really corretc choice for number of nodes

1 Like

3 "dedicated master" nodes is the recommendation, which means they aren't used for any other service, and can be smaller in ram, probably a lot smaller in CPU and disk. They need enough heap to store the cluster config.

Today you are handling the ingest load, adding retention won't add to the ingest load, so you may not need dedicated ingest nodes, but they can be used as client also. I would just set all data nodes as ingest also.

Your 6 data nodes could handle a lot more than 600Gb disk, I think easily 5Tb if you need it and adding a replica would help avoid data loss. You're probably OK on heap usage, but check, you might want more ram and heap.

1 Like

Thank you very much for your answer!
8-16 GB of RAM is enough for the master nodes?

We had a "tie breaker" master with 8Gb ram for a long time, 2 other masters, 1 per rack, are also data nodes. At about 60Tb, 15B docs, 3000 shards, this node stated doing a lot of GC's, the cluster config didn't fit in the heap (5G). Adding ram and using it for more heap would have fixed it, but for our needs, we just made it "voting only".

1 Like

We had a lot of trouble with 8GB heap and 2000 indexes/shards. Note all masters need the same RAM as far as I know, though voting only can be less.

Not sure I understand why you have 2 primary but no replica shards; as easy to lose your data that way. For low-value logs, usually 1 shard, 1 replica is simplest as long as <50GB each.

Frankly, if your data is going from 600GB to 3TB (2x if you add replicas), and if this is not heavily loaded, I'd just go with three nodes, all master/data and 16GB heaps (32GB RAM) and 2-4TB of disk, with 1 shard, 1 replica and see how it goes; nice thing is you can easily expand or adjust - but your ingest load seems light, and as long performance is okay you can grow that - add disk space, add another data node or two, etc.

Of course if you have $$, then do three masters and 4-6 data nodes but seems overkill.

1 Like

Thank you very much for you answer!
I'm taking your advice and going for a 4 nodes with 1.5 TB and 32 GB of RAM in each node, 3 of them are going to be master and data nodes and the other one is going to be a data and ingest node.
And just to clarify, we don't use replicas because of the disk space it would consume. So, because we can't afford it, i use 2 primary shards for a little better performance than just one.

Please let us know how it goes, and sure, if you can lose data, then replicas are not useful, as long as you know the risks (though with ILM people also put older data on spinning disks, freeing up expensive SSD for replicas).

Replicas are useful for more than just data redundancy. Without replicas each node in the cluster will hold a unique set of data not available on any of the others. Which means if one node goes down, all indices with a shard on that node will turn red. And you can't index into a red index. So in practice your cluster will stop accepting new indexing requests from Logstash.

So without replicas you not only risk losing data, but you risk downtime every time a node drops out of the cluster (perhaps due to high load or network hiccups). And it will be impossible to upgrade or do maintenance on a node without taking down the entire cluster.

If the cluster is just for testing or for internal use I guess it's possible to live with the risk of data loss and of unexpected cluster downtime. But if you aim for a robust production environment, with minimal risk of data loss and maximum uptime, then replica shards are a must.

1 Like

3 Nodes which have all roles should work fine for your 20GB/day of data.
Source: Best Cluster Concept, Timeseries Data and more

But 4 Nodes should also do the trick so your planed setup sounds good to me :+1:
Do you really need an ingest node tho? Do you have processors running on these nodes? If not you could just ingest it into any nodes because they will have to index the replicas anyway. What I am concerned is that a single ingest node is also a SPOF and it's not good to have SPOF.

Cheers,
defalt.

1 Like

Without replica shards every data node in the cluster is in practice a single point of failure, so if the goal is to save money (disk) I wouldn't bother with an extra ingest node since the cluster is designed to fail when a node drops out anyway.

Yep, thats why I would never use my cluster without replica shards. Disks are not that expensive or are you really on an ultra low budget? Drop the Ingest node and use the money you would pay for it for bigger disks in the other nodes.

1 Like

Totally agreed - I was going to expand on it (though you did a better job) but decided to cut my answer short and just mention the risk, though you're right I failed to emphasize the overall reliability, index reset, and admin pain of no replicas.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.