How to Reduce Received Logs Size in ELK Stack?

Hi @d.silwon , my new index is apply to new index template today, and it show codec: best_compression in indices settings.

Any idea how to verify the codec: best_compression is working?

Hi @wcpoon,

Good question :slight_smile:

First of all check new index settings:
GET /index_name-2022.04.01/_settings

Maybe enought will be to compare index size between few days:
GET /_cat/indices/index_name-2022.*?v&s=index

No more idea right now.

Best Regards,
Dan

Hi @d.silwon
After few days of running in best_compression.

health status index                             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   logstash-2022.04.01-000004-000003 DbksMBHERza3gISPqMAmlA   1   1  114920759            0     60.6gb         30.3gb
green  open   logstash-2022.04.01-000004-000004 QnEcFMmNSdKuRPejabhH5g   1   1   83416347            0     44.6gb         22.2gb
green  open   logstash-2022.04.01-000004-2      A4ALCm71Q-6Bq2Rv2BGAUw   1   1  115001745            0     60.8gb         30.3gb
{
  "logstash-2022.04.01-000004-000003" : {
    "settings" : {
      "index" : {
        "lifecycle" : {
          "name" : "logstash-policy",
          "rollover_alias" : "logstash",
          "indexing_complete" : "true"
        },
        "codec" : "best_compression",
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "refresh_interval" : "5s",
        "number_of_shards" : "1",
        "provided_name" : "logstash-2022.04.01-000004-000003",
        "creation_date" : "1648894887761",
        "number_of_replicas" : "1",
        "uuid" : "DbksMBHERza3gISPqMAmlA",
        "version" : {
          "created" : "8010199"
        }
      }
    }
  }
}

It is taking around 60GB++ for 100m of data...
Is it normal?

It is hard to tell what is normal or not as it depends on the size of the documents as well as the mappings used. The mappings can make a huge difference on the size as it determines how all the fields are indexed and the default mappings are designed for flexibility rather than storage efficiency. I would recommend going through the link provided earlier around tuning for disk usage.

2 Likes

As @Christian_Dahlqvist wrote it's hard to tell if it is normal or not. Maybe look at a sample document in your index and consider if all the fields are needed.

1 Like

Here is a very old blog post that shows how the size on disk depends on the mappings used and the number of fields indexed. Even though most of the recommendation may no longer be valid as there has been a lot of enhancements and changes over the years the principle still applies, which is described in a more up to date form in the link that was provided.

@wcpoon what is the size of file/files from whole day which you load to the index logstash-*? I would like to know this comparison to index in ELK. Thanks

Hi @d.silwon, I'm pushing my vSphere logs to logstash, input to logstash as syslog and output to Elasticsearch.

Don't think so I can direct load to Elasticsearch.

@wcpoon

No, I just want to compare only the source size vs the index size.

I forward the vSphere logs from Log Insight to ELK stack...
In Log Insight, the file size is huge also, but compare to ELK, ELK size is larger than Log Insight..
just wonder anyway I can reduce the size of logs in ELK...

I think that you should read the blog post mentioned by @Christian_Dahlqvist
There are few key things which can help you reduce the size of stored logs.

Yup, will go through it.
Appreciate with your prompt replied. :slightly_smiling_face:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.