No index compression with "best_compression" in 6.3.2


(Hari Prasad) #1

I am using Elasticsearch 6.3.2 to index log data, which is sent from Logstash 6.3.2. Below is my Logstash pipeline config (output part)

output {
elasticsearch {
hosts => "my-host:9200"
index => "%{[@metadata][index_type]}-%{+YYYY.MM.dd}"
}
}

I am using rest call to set the index compression in Elasticsearch with the below one

PUT log-2018.08.30
{
"settings" : {
"index" : {
"number_of_shards" : 64,
"number_of_replicas" : 2,
"codec":"best_compression"
}
}
}

But with this config, I am not able to achieve any index compression, rather it is bloating.That is for a log file of size less than 2 GB the index size come around 2 GB.

Kindly help me with understanding why there are no compression or what is the mistake that i have done the in the above setup.

Thanks


(Christian Dahlqvist) #2

The best way to enable best_compression is to add it to an index template. This will make it apply to all new indices that the template applies to.


(Hari Prasad) #3

Thank you for the suggestion @Christian_Dahlqvist. but is there any mistake in my current config.

When i get the index settings i am able see that the codec is applied to the index.


(Christian Dahlqvist) #4

If you see it applied in the index settings it is applied. The size your data take up on disk will largely depend on how much enrichment you do and how optimised your mappings are. The improved compression applies to the source and usually in my experience gives a 10%-20% space saving as the data is compressed by default.


(Hari Prasad) #5

The mapping I use is as below

{
"log_instance-2018.07.31": {
"mappings": {
"doc": {
"properties": {
"@timestamp": {
"type": "date"
},
"@version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"hostname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"source": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"thread": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}

@Christian_Dahlqvist: Does this have any problem that would hamper the compression? Or is there any other settings that i will have to tweak to achieve compression


(Christian Dahlqvist) #6

I see that you seem to be using the default dynamic mappings. These do index every field other as text and keyword, which adds a lot of flexibility but can also take up quite a bit of extra space on disk. I would recommend you go through your mappings and optimize them according to these guidelines.

The best_compression codec applies to the JSON source, so these mappings will not be affected.


(Hari Prasad) #7

@Christian_Dahlqvist: thank you. I will try to follow the guidelines to optimise the disk usage.
Do you have any suggestion or link which says exactly how to correctly set the codec ? will setting the codec via template make sure it is applied?


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.