we run a very simple docker based 1node-setup of elastic 5.4.x to aggregate some logfiles. We have the luxury problem (which probably a lot of ppl have at some point) that elastic is working out of the box without any problems and without the need to configure early on. However, this means we have a real problem now: we gathered one and half year of logfiles (atm slightly above one billion records) in:
5 shards each
You can imagine the consequences
So what did I do: Read about the reindex API and reindex all older daily-ones into monthly ones. Started with 2017-MM-* but the number of documents was too high and I ran into queue errors. So I instead I wrote a script which does the reindexing for each day of a month. (Pretty much YYYY-MM-01 .. 29/30/31 -> YYYY-MM) Wasn't really fast but it worked.
At least for 2017, but when arriving in 2018-01 I had a lot of trouble with garbage collecting. We have a server metric for this: it literally never happened for the 2017 records (about 500 million records, 1140 shards). It is a real pain for 2018 (also about 500 million records and ~1000 shards atm). Nothing really changed besides increasing the number of cores (from 6 to 18 now reduced to 12 again). Reindexing becomes slower and slower to a point where it took about an hour to complete 3 million records.
What is causing this trouble? I tried two things:
sleeping for some minutes between the days
setting the refresh_interval to infinite
Both did not help.
Hardware is a virtual machine with 12 cores atm (was 6 when it worked but this resulted in very heavy load so I increased the number of cores) and 20GB RAM.
What is the maximum of shards you would run on a simple single node?
Our plan is to keep the old indices for about a year. We need it this long.
We will implement cronbased aggregation (reindexing of daily indices to monthly ones) of the previous month (maybe at the 10th of each month for safety (to still have daily indices for some days back)) and deletion of very old monthly indices (older then 12 months back)
By keeping the default 5 shards/index this would be around 55 shards in the end for one year + daily indicies for current month so probably (~40*5): ~255shards in total.
How can I find out the typical size of a shard? We currently have ~500gb of data (in filesystem). I already know that 5 shards per index is overkill for this amount of data, but if our server can handle it without trouble I would like to keep it this way.