Hi @d.silwon , my new index is apply to new index template today, and it show codec: best_compression
in indices settings.
Any idea how to verify the codec: best_compression
is working?
Hi @d.silwon , my new index is apply to new index template today, and it show codec: best_compression
in indices settings.
Any idea how to verify the codec: best_compression
is working?
Hi @wcpoon,
Good question
First of all check new index settings:
GET /index_name-2022.04.01/_settings
Maybe enought will be to compare index size between few days:
GET /_cat/indices/index_name-2022.*?v&s=index
No more idea right now.
Best Regards,
Dan
Hi @d.silwon
After few days of running in best_compression.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open logstash-2022.04.01-000004-000003 DbksMBHERza3gISPqMAmlA 1 1 114920759 0 60.6gb 30.3gb
green open logstash-2022.04.01-000004-000004 QnEcFMmNSdKuRPejabhH5g 1 1 83416347 0 44.6gb 22.2gb
green open logstash-2022.04.01-000004-2 A4ALCm71Q-6Bq2Rv2BGAUw 1 1 115001745 0 60.8gb 30.3gb
{
"logstash-2022.04.01-000004-000003" : {
"settings" : {
"index" : {
"lifecycle" : {
"name" : "logstash-policy",
"rollover_alias" : "logstash",
"indexing_complete" : "true"
},
"codec" : "best_compression",
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"refresh_interval" : "5s",
"number_of_shards" : "1",
"provided_name" : "logstash-2022.04.01-000004-000003",
"creation_date" : "1648894887761",
"number_of_replicas" : "1",
"uuid" : "DbksMBHERza3gISPqMAmlA",
"version" : {
"created" : "8010199"
}
}
}
}
}
It is taking around 60GB++ for 100m of data...
Is it normal?
It is hard to tell what is normal or not as it depends on the size of the documents as well as the mappings used. The mappings can make a huge difference on the size as it determines how all the fields are indexed and the default mappings are designed for flexibility rather than storage efficiency. I would recommend going through the link provided earlier around tuning for disk usage.
As @Christian_Dahlqvist wrote it's hard to tell if it is normal or not. Maybe look at a sample document in your index and consider if all the fields are needed.
Here is a very old blog post that shows how the size on disk depends on the mappings used and the number of fields indexed. Even though most of the recommendation may no longer be valid as there has been a lot of enhancements and changes over the years the principle still applies, which is described in a more up to date form in the link that was provided.
@wcpoon what is the size of file/files from whole day which you load to the index logstash-*? I would like to know this comparison to index in ELK. Thanks
Hi @d.silwon, I'm pushing my vSphere logs to logstash, input to logstash as syslog and output to Elasticsearch.
Don't think so I can direct load to Elasticsearch.
No, I just want to compare only the source size vs the index size.
I forward the vSphere logs from Log Insight to ELK stack...
In Log Insight, the file size is huge also, but compare to ELK, ELK size is larger than Log Insight..
just wonder anyway I can reduce the size of logs in ELK...
I think that you should read the blog post mentioned by @Christian_Dahlqvist
There are few key things which can help you reduce the size of stored logs.
Yup, will go through it.
Appreciate with your prompt replied.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.