Unstable Events Rate/Throughput

(Artem Kovalchuk) #1

Hello!

I saw how filebeat metrics and noticed that event rate and throughput looks like jigsaw.

but it's not because there is nothing to send. Our logfiles grows faster than filebeat sends events to elasticsearch. I made graph to visualize it.

Orange graph is sum of sizes all logs that should be processed by logstash.
Green one is sum of offsets in filebeat registry file.
Delta between this two graphs increases slowly during the day and at night time (when our service not loaded) this graphs have same value. Of course I want to delta would be as less as possible but I think that unstable throughput plays the main role in my problem.

Fail/Error monitoring graphs looks good

I use elasticsearch and filebeat 6.5.4
My filebeat.yml configuration:

filebeat.inputs:
- type: log
  enabled: true
  paths:
  - /data/filebeat/logs/*/beat.*
  json.keys_under_root: true
  json.overwrite_keys: true
  json.add_error_key: true
  close_inactive: 1h

processors:
- drop_fields:
    fields: ["beat", "input", "offset", "prospector", "source", "host"]
- drop_event:
    when:
      not:
        has_fields: ["indexKey"]

setup.template.enabled: false
xpack.monitoring.enabled: true

queue.mem:
  events: 8192
  flush.min_events: 2048
  flush.timeout: 1s

output.elasticsearch:
  hosts: ["url_to_elasticseach"]
  username: "XXXXXX"
  password: "XXXXXX"
  index: "%{[indexKey]}-%{+yyyy.MM.dd}"
  bulk_max_size: 2048
  worker: 4

logging.level: info
logging.to_files: true

Thank you in advance, any help will be very appreciated!

(Christian Dahlqvist) #2

Filebeat can only send as fast as downstream systems, e.g. Elasticsearch, is able to accept data, so it is worth looking at how Elasticsearch is performing and verifying whether or not it is the bottleneck. Do you e.g. see high CPU usage on Elasticsearch? Do yo see evidence of long and/or frequent GC? If you look at stats, are merges being throttled? What does disk I/O and iowait look like?

1 Like
(Artem Kovalchuk) #3

Thank you for advice! You're right elastic was the bottleneck.
I monitored our metrics and noticed that some of our old services sent events directly to elastic using a large number of threads (without filebeat).
I've migrated all our services from direct event sending to using filebeat and problem is gone.
Filebeats graphs look much better now.

(system) closed #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.