Hello. I have three clients. They send logs via filebeat
to main server with logstash
. A few days ago there are a lot of logs somewhere around 12 M in an amount. The logstash
couldn't process all this logs and it fell. Does it have any throttling settings to avoid this situation again?
You need to share the pipeline configuration you are using and your logstash.yml
to help understand what is happening.
What do you run on the logstash machine, only logstash or other applications? What is the hardware specs? Are you using persisted queues or in-memory queues?
Also, share the logstash log with the errors when it crashed.
Hi @Andrey_RF
In addition to the above.
did you make any changes / scale up the JVM heap for Logstash see here ..
If you are on larger host you can certainly ignore the 8GB top, I have had to scale above that for some intensive workloads.
A not to point out the obvious, there are a couple good sections in the docs on scaling and performance tuning.
pipeline
:
input {
beats {
port => 5000
}
}
filter {
if 'django' in [tags] {
grok {
break_on_match => false
match => {
"message" => [
"^%{LOGLEVEL:log-level} %{TIMESTAMP_ISO8601:timestamp}",
"Пользователь - (?<login>[^\;]+)",
"Имя: (?<name>[^\;]+)",
"id - (?<user-id>[^\;]+)",
"email - (?<email>[^\;]+)",
"ip - %{IP:ip}",
'Запрос: "(?<request>[^\"]+)',
'Метод: "(?<method>[^\"]+)',
"Модуль: (?<module>[^\;]+)",
"Функция: (?<func>[^\;]+)"
]
}
}
} else if 'gunicorn' in [tags] {
grok {
match => { "message" => '^%{LOGLEVEL:log-level} %{TIMESTAMP_ISO8601:timestamp}%{GREEDYDATA:message}' }
overwrite => [ "message" ]
}
grok {
break_on_match => false
match => { "message" => [
'^ \n\tСообщение: %{IP:ip} "%{WORD:http-method} (?<http-url>[^\s]+) (?<http-protocol>[^\"]+)" (?<status>[\d]+) "URL: (?<url>[^\"]+)" "[^\"]+";',
'В модуле: (?<module>[^\n]+)'
]
}
}
} else if 'nginx' in [tags] {
grok {
match => {
"message" => [
'^%{IP:client-ip} - - \[(?<timestamp>[\d]+/[\w]+/[\d]+:[\d]+:[\d]+:[\d]+) \+[\d]+\] "%{WORD:http-method} (?<http-url>[^\s]+) (?<http-protocol>[^\"]+)" (?<status-code>[\d]+) (?<bytessent>[\d]+) "(?<refferer>[^\"]+)" "(?<user-agent>[^\"]+)" "-"$',
'^%{IP:client-ip} - - \[(?<timestamp>[\d]+/[\w]+/[\d]+:[\d]+:[\d]+:[\d]+) \+[\d]+\]',
"^(?<timestamp>[\d]+/[\d]+/[\d]+ [\d]+:[\d]+:[\d]+) \[%{LOGLEVEL:log-level}\]"
]
}
}
} else {
grok {
match => [
"message", "(?m)%{TIMESTAMP_ISO8601:timestamp} \[%{LOGLEVEL:log-level}\]%{GREEDYDATA:message}", # rabbitmq.info
"message", "\[%{TIMESTAMP_ISO8601:timestamp}: %{LOGLEVEL:log-level}", # celery.log
"message", "%{TIMESTAMP_ISO8601:timestamp}" # попытка просто достать лог
]
}
}
date {
match => ["timestamp", "yyyy-MM-dd HH:mm:ss,SSS", "yyyy-MM-dd HH:mm:ss", "yyyy-MM-dd HH:mm:ss.SSS", "yyyy/MM/dd HH:mm:ss", "dd/MMM/yyyy:HH:mm:ss", "ISO8601"]
timezone => "Europe/Moscow"
remove_field => [ "timestamp" ]
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "logstash-%{[host][hostname]}"
}
}
filebeat.yml
:
# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/eias-web/backend.*.log
tags: ["django"]
multiline.type: pattern
multiline.pattern: '^INFO|^ERROR|^WARNING|^CRITICAL'
multiline.negate: true
multiline.match: after
- type: log
enable: true
paths:
- /var/log/eias-web/fingerprint.info.log
- /var/log/eias-web/gunicorn.*.log
tags: ["gunicorn", "fingerpirnt"]
multiline.type: pattern
multiline.pattern: '^INFO|^ERROR|^WARNING|^CRITICAL'
multiline.negate: true
multiline.match: after
- type: log
enabled: true
paths:
- /var/log/eias-web/celery.log
tags: ["celery"]
miltiline.type: pattern
multiline.pattern: '^\['
multiline.negate: true
multiline.match: after
- type: log
enable: true
paths:
- /var/log/eias-web/crash.*
- /var/log/eias-web/rabbitmq.*.log
tags: ["crash", "rabbitmq"]
multiline.type: pattern
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
# filestream is an experimental input. It is going to replace log input in the future.
- type: filestream
# Change to true to enable this input configuration.
enabled: false
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /var/log/*.log
#- c:\programdata\elasticsearch\logs\*
# ============================== Filebeat registry =============================
filebeat.registry.path: ${path.data}/registry
filebeat.registry.file_permissions: 0600
# ============================== Filebeat modules ==============================
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
#reload.period: 10s
# ======================= Elasticsearch template setting =======================
setup.template.settings:
index.number_of_shards: 1
#index.codec: best_compression
#_source.enabled: false
# ================================== General ===================================
# ================================= Dashboards =================================
# =================================== Kibana ===================================
# =============================== Elastic Cloud ================================
# ================================== Outputs ===================================
# Configure what output to use when sending the data collected by the beat.
# ---------------------------- Elasticsearch Output ----------------------------
# ------------------------------ Logstash Output -------------------------------
output.logstash:
# The Logstash hosts
hosts: ["ip:5000"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
# ================================= Processors =================================
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_cloud_metadata: ~
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
# ================================== Logging ===================================
# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
#monitoring.enabled: false
# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
# ============================== Instrumentation ===============================
# Instrumentation support for the filebeat.
#instrumentation:
# Set to true to enable instrumentation of filebeat.
#enabled: false
# Environment in which filebeat is running on (eg: staging, production, etc.)
#environment: ""
# APM Server hosts to report instrumentation results to.
#hosts:
# - http://localhost:8200
# API Key for the APM Server(s).
# If api_key is set then secret_token will be ignored.
#api_key:
# Secret token for the APM Server(s).
#secret_token:
# ================================= Migration ==================================
# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true
It runs logstash
, elasticsearch
and kibana
on the same machine. It's 192 G free space. Idk How can I check it ?
logstash
is in docker
and logs don't save.
It's docker-compose
settings for logstash
:
logstash:
build:
context: logstash/
container_name: logstash
volumes:
- ./logstash/config/logstash.yml:/usr/share/logstash/config/logstash.yml:ro
- ./logstash/pipeline:/usr/share/logstash/pipeline:ro
ports:
- "5000:5000"
environment:
LS_JAVA_OPTS: "-Xmx256m -Xms256m"
networks:
- elk
depends_on:
- elasticsearch
First thing I would try is set Logstash heap to 4GB.
You could mount the Logstash logs to the a volume so you could see the logs.
How much heap are you giving elasticsearch?
How much total RAM and CPU on the server this is all running on?
elasticsearch:
build:
context: elasticsearch/
container_name: elasticsearch
volumes:
- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
- /home/elasticsearch:/usr/share/elasticsearch/data
ports:
- "9200:9200"
environment:
ES_JAVA_OPTS: "-Xmx4096m -Xms4096m"
free -h
total used free shared buff/cache available
Mem: 11G 5.5G 317M 69M 5.7G 5.7G
Swap: 5.9G 39M 5.8G
cat /proc/cpuinfo | grep core
cpu cores : 2
cpu MHz : 1995.000
Ok
So 4GB to elasticsearch
So try to give 2GB to Logstash.
Not a huge server to run this all on...
Only 2 cores.. that elasticsearch and Logstash are fighting to over.
It is Generally not best practice to run Logstash and elasticsearch on same server.
With docker so/so ...
Maybe for small testing but for production you would need a bigger server... And make sure elasticsearch and Logstash have plenty of ram and CPU
The 11M logs over what time frame?
We have not so big a website now. It's only 200-400 requests per hour and ELK
works good. We will extend the machine if the load increase. I just want to find something like throttling to avoid the situation like this.
It was 11 M logs for 15-30 minutes.
Well that is quite a spike right
10M events in 30 min is 5.5K events / sec that is several thousand times your normal load.... It's going to be very hard to design a system that is both only for 400 per hour but also 5000 per second those are two pretty different systems.
BUT that said you could look at the persistent queue ... But I still not sure that'll work.
If you know when that spike is going to come you could scale up log stash and then scale it back down.
But this is a classic system design question of designing for average or peak usage... There is always trade-offs.
I see. It was at first time for all time. Mb it was a ddos or something else.
I will read about queue in logstash, thank you
Yeah usually people try to detect DDOS closer to the edge /FW etc or you could do something like Kafka which is much better at managing back pressure but go ahead and try the persistent queue first but that is a huge Spike
Ohhh ... apologies @Andrey_RF
I maybe misunderstood was this the very first time you started logstash? if so it may be going back and reading all the old logs / files in the directory... that's what it does so it may try to have loaded all the old logs...
Or was it definitely a spike within time?
No no no. It had worked a week before it fallen. I have a correct log's timestamp so you can see a chart.
Yup that wont be easy...
You could try putting in more heap and some persistent queues... that may or may not help.
While the system keeps up... there will be nothing in the queue.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.