Centralized logging with ELK Stack - Sizing question

Hi,

I am very new to the fantastic world of Elastic and the whole ELK Stack, so sorry for any dumb question that might pop up in this post.

I am right now working on a project where we are looking into replacing our existing central log environment, which consist of a syslogd on FreeBSD with ZFS with BZIP-9 compression.
It does the job, fair an simple, but working with the logs is a big hassle.

We would like to replace that solution with a new fancy shining ELK Stack.

Current environment:
Daily logs: Approx 13,6GB compressed logs on ZFS with a compress ratio on about x15 with BZIP9, which gives us a rough estimate of 200GB / day of uncompressed logs.

Target:
Hot storage 7 days (Approx 1.5TB SSD Storage)
"Archive": logs >7 days and < 366 days (Approx 71TB HDD Storage)

Design:
2x NGINX Loadbalancers with VRRP (Keepalived)
2x Logstash
1x Master node
2x Master / Data nodes HOT 64GB RAM / 8 cores, 31gb Heap
2x Data Nodes Warm (Archives). 64GB RAM / 8 cores, 31gb Heap

Question 1:
Scenario: Logstash is processing an example Cisco ASA file with 823,458 rows, raw file size of 136MB. Logstash index size becomes 364MB.
Example JSON from a document = 1,007 bytes
RAW message = 141 bytes
Actual file size on disk = 174 * 2 for both index shards = 374mb, which is fine. Then we have 315mb+319mb translogs, so the total used disk space = 980mb. About x7 of the actual raw log file size.

Question: Sorry for the stupid question here. Will I need to take account for the translog for old indices (last days for example)? Or will they only exist while there is changes done to the index?

Question 2
Archive design.
I have been testing and thinking about how to get the requirements lower for storage on our archive. As we don't have requirements for quick searches on logs older than 7 days. It makes me think of transparent compression on the filesystem here.
Does anyone have any experience with ZFS with BZIP9 with Elastic?
If these numbers don't lie, it do look like we could get fairly substantial savings by doing this.

Test Index on XFS:
root@xxxxx:/elasticsearch/data/nodes/0/indices/TdnOInHMSa-DtCD3FN2mvA# du -h
174M ./0/index
4.0K ./0/_state
315M ./0/translog
488M ./0
4.0K ./_state
174M ./1/index
4.0K ./1/_state
319M ./1/translog
492M ./1
980M .

Same Index on ZFS (data compression gzip-9)
root@xxxxxxx:/data/TdnOInHMSa-DtCD3FN2mvA# du -h
87M ./0/translog
1.5K ./0/_state
75M ./0/index
162M ./0
88M ./1/translog
1.5K ./1/_state
76M ./1/index
163M ./1
2.0K ./_state
325M .

The translog is kept around a while to allow faster recovery based on sequence numbers. You can tune these settings, but it should stop affecting indices once the translog retention age is reached.

In order to minimise storage space, you can optimise mappings and index settings. Some tips and guidance is provided in this blog post, and although it was written for Elasticsearch 5.x and the _all field now has been removed, most of it should still be relevant.

I would recommend trying to keep the average shard size quite large as larger shards tend to be able to compress better. You can also run the force merge API on indices no longer being indexed into to reduce the number of segments and save space. You can also enable best_compression for indices as described in the documentation. This should reduce the size of your index.

If you look at the space savings you got from ZFS compression, most of that was related to the translog, which should not exist for older indices. Once you have gone through the steps above I would expect relatively little to be gained from using ZFS with compression.

Hi Christian,

Thank's a lot for taking time to help out, it is highly appreciated.

I will look into the translog tuning and also look into optimizing the mappings of the indices.

Regarding the use of best_compression, this is what is enabled on the example index at the moment, so even though it is enabled, ZFS is doing a great job reducing the size even further. about 50% on the actual index.

I'll spend some time looking through the documents that you linked towards, and see if I get the same savings after the optimization and tuning.

Take care Christian,

/Robin

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.