ElasticSearch Performance

Hi All,

So quite new to ElasticSearch, and very new to large amounts of data that we seem to be collecting, and because of this i have a few problems that hopefully someone can help with.

Background:

  • We're collecting about ~5 million documents a day, on a 4 node cluster, 2 "hot" and 2 "warm" (Yes, it's under resourced but I don't want to pay the additional costs for more nodes!)
  • We are using ElasticCloud for this within AWS, I believe the 2 hot nodes have 8GB RAM and the 2 warm nodes have 4GB each.
  • The storage allocated to the cluster is about 1.5TB or something yet, that doesn't appear to be too usable

So what seems to work fine is the writing data to the cluster, I've never seen it in any other state than green which is good, and the memory / CPU of the boxes does not appear to be stressed however searches and functions take ages, literally ages to run, some just never complete and I get an error.

One of the advisories is to limit the number of shards on the cluster, each index has 2 shards and I have daily indices but only looking to keep 90 days worth at a max!

First question of many.. Would it be better (for search) to have weekly indices even though they would be huge?

Thanks

Oliver

It depends what "huge" means.

In general try to have a shard size between 20 to 50gb.

Hi David,

At the moment we're getting ~9GB and ~13GB a day, I guess if we combined them to weekly they could be anywhere from 100GB upwards.. so .. I guess that is what i'd call huge!

Are there any "tweaks" you could recommend to get things searching better, and not constantly throwing up errors?

Thanks

Oliver

I'd recommend looking at https://www.elastic.co/guide/en/elasticsearch/reference/7.9/ilm-rollover.html and define max_size to let say something like 30gb...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.