Hi,
We're currently facing some performance issues with our ElastiSearch cluster and trying to find what is the issue and how we may solve it. We have system with Nxlog -> Logstash Broker -> Redis -> 12 Logstash clients -> 9 ElasticSearch Node + 1 ElasticSearch Master. At some point we hit the situation when the data processing slows down. The symptoms are that logstash machine do not take data with the same speed, broker puts it to Redis, which causes two things:
1 - delay in putting data to shards - we can see up to couple of hours in data processing
2 - redis queue became overloaded, reaching up to 30 million documents and redis just being killed by OS
We don't see any specific metrics in Marvel or/and HQ o KOPF plugins that ES nodes are overloaded, everything looks absolutely normal.
So, I appreciate any help or advice, since we don't see anything that can help us identify the problem
Below is our configuration:
Logstash Broker:
input {
file {
type => "syslog_product"
path => ["/data/product/*"]
sincedb_path => "/data/sincedb"
}
}
output {
stdout {}
redis {
host => ["euwest-redis"]
data_type => "list"
key => "product:syslog_product"
type => "syslog_product"
batch => true
workers => 8
}
}
Logstash Machines:
input {
redis {
host => ["euwest-redis"]
data_type => "list"
key => "product:syslog_product"
type => "syslog_product"
tags => "product_pri"
threads => 8
batch_count => 200
}
}
filter {
grok {
match => ["message", "%{DATA:hostname} %{DATA:cluster} %{GREEDYDATA:empty} - - - [%{MONTHDAY:day}/%{MONTH:month}/%{YEAR:year}:%{HOUR:hour}:%{MINUTE:minute}:%{SECOND :second}+%{GREEDYDATA:empty}] {{ %{DATA:http_request} /%{DATA:snippet}/%{DATA:referer} }} %{DATA:http_code} {{ %{DATA:empty} }} {{ %{DATA:url} }} {{ %{DATA:browser} }} {{ %{DATA:empty} }} {{ %{DATA:client_ip} }} {{ %{DATA:empty} {{ %{DATA:empty} }} {{ %{DATA:empty} }} {{ %{DATA:empty} }} {{ %{DATA:session_time} }} {{ %{DATA:empty} }} {{ %{DATA:session_id} }} {{ %{DATA:snippet_id} }} {{ %{DATA:product_version} }} {{ %{DATA:papyrus_revision} }}"]
}
mutate {
replace => [ "@source_host", "%{hostname}" ]
remove => [ "empty", "@source_path", "@source" ]
convert => [ "snippet", "integer", "session_time", "float" ]
}
date {
match => [ "MMM d HH:mm:ss", "MMM dd HH:mm:ss", "ISO8601" ]
}
if "_grokparsefailure" in [tags] { drop {} }
}
output {
elasticsearch {
cluster => "G177"
host => "euwest-elastic"
port => "9300"
index => "logstash-%{+YYYY.MM.dd}"
manage_template => false
}
}
ElasticSearch Node :
cluster.name: G177
node.name: elasticsearch-euwest-qqqq
node.master: false
node.data: true
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["MASTER IP"]
network.host: eth0:ipv4
path.conf: /etc/elasticsearch
path.data: /ebs/elasticsearch
path.logs: /data/logs/elasticsearch
path.plugins: /usr/share/elasticsearch/plugins
indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000
index.store.type: mmapfs
index.refresh_interval: 10s
indices.fielddata.cache.size: 25%
indices.cluster.send_refresh_mapping: false
index.number_of_replicas: 1
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms
index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms
indices.store.throttle.type: none