Tuning Elasticsearch - massive virtual memory timeouts

I haven't had time to look at this in awhile and looks like its way out of control - need help/recommendations on tuning this before replicating to another node.
I've got winlogbeat indices going back over a year, I've got all kinds of timeouts, etc.
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
63868 logstash 39 19 13.2G 3372M 21368 S 5.2 1.3 35:41.60 /usr/bin/java -Xms3g -Xmx3g
63818 elasticse 20 0 2.4T 36.8G 18.0G S 5.2 14.6 1:17.63 /usr/bin/java -Xms16g -Xmx16g

curl -X GET "localhost:9200/_cluster/stats?human&pretty"
https://pastebin.com/bJ2MCrh2

It's a mess! Who's up for a challenge?????:grinning:

You have 1800 shards on a single node with 2.3TB of data and 16GB of heap, which averages out as 1.2GB per shard on disk and 112 shards per GB of heap. This article recommends to aim for shards that are tens of GB in size (20-40GB) and to have ~20 shards per GB of heap. I think this is a simple case of having too many shards with not enough data in them.

1 Like

OK, cool! Read your link and watched this video -

...and seems like the main problem for most people is too many shards.

Currently have a time based index, daily with ~ size of each index 5 GB 3-4,000,000 docs and 10 shards(no idea where the 10 came from)

1800 shards total- and based on what the articles said I should have around 58 shards total right now - 2,300GB(Store Total) /40GB(per shard)

First step is to modify current Winlogbeat template to 2 shards going forward(adding a node to the cluster so going with 2) and moving refresh_interval from 1s to 30.

PUT _template/winlogbeat
{
  "template": "winlogbeat-*",
  "settings": {
    "number_of_shards":   2,
    "number_of_replicas": 1,
    "refresh_interval": "30s"
  },
  "aliases": {
    "winlogbeat": {}
  }
}

Does this look right?

And is there a way to shrink more than one index at a time?

Thank you for your help!!

5GB of data per day isn't really enough data to justify daily indices. Weekly indices with 1 shard would be ~35GB according to your figures, which seems about right. Using a replica means that search traffic can be balanced across both nodes, even with a single shard.

In the sense of the shrink index API (i.e. PUT /index/_shrink)? I didn't think there was any restriction on calling that API, although there will be some limits in Elasticsearch that mean it queues up some of the shrinks so as not to overload itself. Yet your system is already overloaded, so I would recommend proceeding slowly and not trying to do too many things at once.

I would also recommend that you consider using reindex instead, to convert your existing daily indices into weekly ones.

I should have mentioned that you can relieve a bunch of heap pressure by closing any indices you don't need for now, and this will help to make your system more stable. Then you can open a few, reindex them, delete the originals, and repeat, avoiding having all 1800 shards open at once.

OK!

  1. Moving winlogbeats to weekly indices -
    In logstash conf.d folder I switched my elasticsearch output from:

index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"

to:

index => "%{[@metadata][beat]}-%{+xxxx.ww}"

...that seems to work great.

  1. Closing open indices for temporary relief
    Got a status of winlogbeat indices with this:

curl http://localhost:9200/_cat/indices/winlogbeat-*?v

and closed them like this:

curl -X POST "localhost:9200/winlogbeat-2018.01.*/_close"

...worked great.

  1. QUESTION on moving old indices from daily to weekly:
    If I reindex like this -
curl -XPOST "http://alpha:9200/_reindex" -d'
{
   "source": {
     "index": "winlogbeat-2018.01.*"
   },
   "dest": {
     "index": "winlogbeat-2018.01"
   }
}'

BUT is this monthly or weekly?

That looks monthly: winlogbeat-2018.01.* will match all the daily indices from January 2018. There's no easy one-shot way to do this using wildcards, because computers make time too difficult. Maybe monthly is ok for old data, for the sake of simplicity. Alternatively, note that you don't have to do the whole reindex in one single command, so you could work one day at a time:

curl -XPOST "http://alpha:9200/_reindex" -d'
{
   "source": {
     "index": "winlogbeat-2018.01.01"
   },
   "dest": {
     "index": "winlogbeat-2018.01"
   }
}

curl -XPOST "http://alpha:9200/_reindex" -d'
{
   "source": {
     "index": "winlogbeat-2018.01.02"
   },
   "dest": {
     "index": "winlogbeat-2018.01"
   }
}

etc. Probably best not to kick off too many of these at once.

Hmm, OK so I guess there's no way to reindex to weekly? Just to keep everything the same?

Thank you so much for your help on this one, definitely learned a lot.

Why not go to monthly indices for older data and adjust the number of primary shards to get a reasonable average shard size?

That would be fine...I was more curious than anything.

Ok so reindexing all of winlogbeat-2018.01.* into winlogbeat-2018.01 BUT one day at a time.

Ran this to find out how much January has in it's 'store'

curl http://localhost:9200/_cat/indices/winlogbeat-2018.01*?v

looks like there's about 133GB in January so 5 shards in winlogbeat-2018.01 should be fine.

Starting the process

curl -XPOST "http://localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "winlogbeat-2018.01.01"
},
"dest": {
"index": "winlogbeat-2018.01"
}
}'

and then
curl -XDELETE "http://localhost:9200/winlogbeat-2018.01.01"

This seems to shrink everything drastically.

There is, but it's not so easy as the monthly thing: for each day, in turn, reindex it into the appropriate weekly index. You will need to compute the name of the weekly index yourself. Not sure it's worth the hassle.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.