No index compression with "best_compression" in 6.3.2

Hari_Prasad · September 19, 2018, 3:04am

I am using Elasticsearch 6.3.2 to index log data, which is sent from Logstash 6.3.2. Below is my Logstash pipeline config (output part)

output {
elasticsearch {
hosts => "my-host:9200"
index => "%{[@metadata][index_type]}-%{+YYYY.MM.dd}"
}
}

I am using rest call to set the index compression in Elasticsearch with the below one

PUT log-2018.08.30
{
"settings" : {
"index" : {
"number_of_shards" : 64,
"number_of_replicas" : 2,
"codec":"best_compression"
}
}
}

But with this config, I am not able to achieve any index compression, rather it is bloating.That is for a log file of size less than 2 GB the index size come around 2 GB.

Kindly help me with understanding why there are no compression or what is the mistake that i have done the in the above setup.

Thanks

Christian_Dahlqvist · September 19, 2018, 5:24am

The best way to enable best_compression is to add it to an index template. This will make it apply to all new indices that the template applies to.

Hari_Prasad · September 19, 2018, 6:37am

Thank you for the suggestion @Christian_Dahlqvist. but is there any mistake in my current config.

When i get the index settings i am able see that the codec is applied to the index.

Christian_Dahlqvist · September 19, 2018, 6:53am

If you see it applied in the index settings it is applied. The size your data take up on disk will largely depend on how much enrichment you do and how optimised your mappings are. The improved compression applies to the source and usually in my experience gives a 10%-20% space saving as the data is compressed by default.

Hari_Prasad · September 20, 2018, 6:39am

The mapping I use is as below

{
"log_instance-2018.07.31": {
"mappings": {
"doc": {
"properties": {
"@timestamp": {
"type": "date"
},
"@version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"hostname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"source": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"thread": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}

@Christian_Dahlqvist: Does this have any problem that would hamper the compression? Or is there any other settings that i will have to tweak to achieve compression

Christian_Dahlqvist · September 20, 2018, 7:05am

I see that you seem to be using the default dynamic mappings. These do index every field other as text and keyword, which adds a lot of flexibility but can also take up quite a bit of extra space on disk. I would recommend you go through your mappings and optimize them according to these guidelines.

The best_compression codec applies to the JSON source, so these mappings will not be affected.

Hari_Prasad · September 20, 2018, 9:37am

@Christian_Dahlqvist: thank you. I will try to follow the guidelines to optimise the disk usage.
Do you have any suggestion or link which says exactly how to correctly set the codec ? will setting the codec via template make sure it is applied?

system · October 18, 2018, 9:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to implement deflate (best compression)? Elasticsearch	24	2846	December 23, 2021
Does Elasticsearch automatically compress indexes? Elasticsearch	6	592	September 1, 2020
Globally set best_compression Elasticsearch	2	386	March 28, 2019
Is there any drawback of using best_compression while indexing in Elasticsearch? Elasticsearch	2	6043	December 30, 2016
How does the best_compression work after I setting it? Elasticsearch	4	1634	April 17, 2017

No index compression with "best_compression" in 6.3.2

Related topics