Hey folks, I got a running ELK Stack with Filebeat, Logstash and Kibana.
It startet do receive logs, especcially apache-access and apache-error logs. Unfortunately the logs are getting really big, e.g:
12GB Access Logs are 21Gb in Elasticsearch
109MB Access Logs are 255MB in Elasticsearch
3.9MB error logs are 32MB in Elasticsearch
After every "Benchmarking" i deleted the elasticsearch database completly and did a
sudo du -sch /var/lib/elasticsearch
so I googled a lot and figgured out that using "index.codec: best_compression" in the elasticsearch.yml could help a bit. Trying this i received different results:
109MB Access Logs are 384MB in Elastisearch after some time it reduced to 246MB
3.9MB error logs are 32MB in Elasticsearch, after some time it reduced to 14MB.
The first question is: Is this normal?
The 2nd question is: What did I wrong?
The 3rd question is: How can i reduce the space-amount easily?
I know by deleting the _source or i cut away a part of the message, but I don't know how exactly I can do this.
Here is the use case:
Receiving Apache-Access and Apache-Error logs from Webservers, finally we are getting 800MB log files every day. They should be stored like 6 - 8 weeks. It isn't nessercary that elasticsearch is very fast, high compression is important.
here is the elasticsearch.yml
cluster.name: mura_test
node.name: Ashigaru
network.host: localhost
index.codec: best_compression
Other settings are just default
Logstash conf:
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
}
}
if [type] == "apache-access" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
}
}
if [type] == "apache-error" {
grok {
patterns_dir => ["/etc/logstash/patterns"]
match => { "message" => "\[(?<timestamp>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR})\]\s\[.*:%{LOGLEVEL:loglevel}\]\s\[\w+\s%{NUMBER:pid}\]\s(?<issue>\(\d+\)\D*:\s)?(\[client\s%{HOSTPORT:client}\])%{GREEDYDATA:message}" }
}
date {
match => [ "timestamp", "EEE MMM dd HH:mm:ss.SSSSSS YYYY" ]
}
mutate {
remove_field => ["timestamp","fields.env","beat.version","beat.name","input_type"]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => "localhost:9200"
}
}
PS: I am thinking of creating an index pattern for the Logfiles, but this seems complicated.