Index tuning

hi everyone

we have a datacenter and i want to centralize logs with ELK. now i setup a cluster ELK ( 6 elasticsearch cluster + one logstash server + one Kibana server) . i shipping logs from servers (50 linux servers) with filebeat to logstash server. my problem is disk usage is high . my index is filebeat-* and i think this index is not logical. how can i tune indexing and decrease disk usage? thanks in advance

What is high disk usage? Also, what is your average EPS and how long have you been indexing?

hi again

i have 6 elastic search nodes cluster and each them have 50GB disk . it will be full after around 5 or 7days completely :frowning:

list of my indices

curl localhost:9200/_cat/indices?v |sort

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2574 100 2574 0 0 93440 0 --:--:-- --:--:-- --:--:-- 95333
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open filebeat-2017.06.10 3B85p754RUmejeQD99nvOw 5 1 4641 0 1.9mb 1.9mb
yellow open filebeat-2017.06.11 nXY2WLoaQ0a2LJaWvFQZCA 5 1 3807 0 1.7mb 1.7mb
yellow open filebeat-2017.06.12 pV9ixyUnRd-l4g9s81kzPQ 5 1 4144 0 1.8mb 1.8mb
yellow open filebeat-2017.06.13 Sa6FclpfTdy_D1tFwEAKOg 5 1 3480 0 1.7mb 1.7mb
yellow open filebeat-2017.06.14 --1-WtNyQpWHI2OhwxRGjg 5 1 5788 0 2.4mb 2.4mb
yellow open filebeat-2017.06.15 5ZXBo-FjRb2V8WX9h2jcTw 5 1 5414 0 2.3mb 2.3mb
yellow open filebeat-2017.06.16 kXikL4JfToKMu23LQa2bZw 5 1 5406 0 2.3mb 2.3mb
yellow open filebeat-2017.06.17 URjRexrbR626L4JwX93wMQ 5 1 7620 0 3.1mb 3.1mb
yellow open filebeat-2017.06.18 CrvDMyNsTreOTmwh7Di-5w 5 1 6023 0 2.4mb 2.4mb
yellow open filebeat-2017.06.19 R8GErI2nQHiCyXEnGhrxqA 5 1 5515 0 2.3mb 2.3mb
yellow open filebeat-2017.06.20 K3T-_tuuRIas0SVhTZRp9w 5 1 5389 0 2.3mb 2.3mb
yellow open .kibana BRpojjNnSFyp76ETCNJ4ug 1 1 5 1 35.6kb 35.6kb
yellow open metricbeat-2017.06.10 J1wIlMMAR0OS-18z5VJIGA 5 1 36036859 0 9.8gb 9.8gb
yellow open metricbeat-2017.06.11 AMavU8KTSZ6PHLhX0QpDPg 5 1 47117289 0 12.7gb 12.7gb
yellow open metricbeat-2017.06.13 n_3V1f92S4eqT29EkGWMOg 5 1 41507444 0 10.5gb 10.5gb
yellow open metricbeat-2017.06.14 ii_SM-RSSRCkVJ7Z4Qqm8A 5 1 46230996 0 11.7gb 11.7gb
yellow open metricbeat-2017.06.15 GQQ6rfSgSxmGi1rLzevhZQ 5 1 46675560 0 11.9gb 11.9gb
yellow open metricbeat-2017.06.17 JKa9js-vRXOYm9r2e60_9Q 5 1 47844479 0 12.2gb 12.2gb
yellow open metricbeat-2017.06.18 KIjih1rKQI2Z0X2Rk3QuDQ 5 1 47252245 0 12.1gb 12.1gb
yellow open metricbeat-2017.06.19 VM67ZUPPQhSeYOsBvOSjzQ 5 1 48068938 0 12.1gb 12.1gb
yellow open metricbeat-2017.06.20 vX24NPQDQoGq3-oEmIzS0g 5 1 48246319 0 12.2gb 12.2gb

and in my logstash server:

cat 30-elasticsearch-output.conf

output {
elasticsearch {
hosts => ["node1:9200","node2:9200","node3:9200","node4:9200", "node5:9200", "node6:9200"]
manage_template => false
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}

cat 10-syslog-filter.conf

filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}

It looks like the vast majority of your data originates from metricbeat. Have you tried tuning/reducing how frequently different modules collect data?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.