Filebeat poor performance when publishing existing file to ES


(John) #1

Hello,

I am using filebeat to load existing log file directly into ES.

Sample config:

spool_size : 4049
publish_async: true
filebeat.prospectors:
- paths: [ "/data/test.log" ]
  harvester_buffer_size: 163840
  fields_under_root: true
  fields:
    type: "test1_log"

output.elasticsearch:
  worker: 8
  bulk_max_size : 4096
  hosts: ["localhost:9200"]
  pipelines:
    - pipeline: test-pipeline
      when.equals:
        type: "test1_log"
pipeline:
{
  "description" : "Test1 Log",
  "processors" : [
    {
      "grok" : {
        "field": "message",
        "patterns": [ "%{TIMESTAMP_ISO8601:date} %{NUMBER:reqtime} %{NUMBER:http
code} %{IP:ip} %{GREEDYDATA:text}" ]
      },
      "date": {
        "field": "date",
        "target_field": "date",
        "formats": [ "yyyy-MM-dd HH:mm:ss" ]
      },
      "date_index_name" : {
        "field" : "date",
        "index_name_prefix" : "test1-",
        "date_rounding" : "d"
      }
    }
  ]
}

I tried both 1-server ES setup (default) and 2-servers setup. Each server has 2x 16core CPUs.

When I start filebeat, it appends log lines into ES with speed about 4000/second.
filebeat process consumes about 15% of CPU core, java (elasticsearch) consumes about 120-150% of CPU. Disks are not overloaded.

So it looks like system resources are mostly idle, but loading speed is low.
Anything I can tune to improve publishing speed until I hit some resource constrain on my server (CPU, disks)?

Thanks in advance.


(John) #2

This thread looks similar:


(Steffen Siering) #3

please properly format logs and config files using the </> button.

Which filebeat version are you using? In 5.x it must say: filebeat.spool_size: and filebeat.publish_async.

With the configuration you have right now load-balancing is not active.

I'd recommend having the filebeat.spooler_size a multitude of workers time bulk_max_size, so batches are split into sub-batches to be properly processed concurrently. This works even is publish_async is disabled.

e.g.:

filebeat.prospectors:
- paths: [ "/data/test.log" ]
  fields_under_root: true
  pipeline: "test-pipeline"

# spooler size = 2 * 8 * 4096 => split batch into 16 sub-batches to be send concurrently 
filebeat.spool_size: 65536

# experimental feature, let's first test without it
# filebeat.publish_async: true

output.elasticsearch:
  worker: 8
  bulk_max_size: 4096
  hosts: ["localhost:9200"]

  # if fields.pipeline is not available in event, no pipeline will be used
  pipeline: "%{[pipeline]}"

(Steffen Siering) #4

Be carefull to not overload elasticsearch internal pipelines (to big batches or to many workers). In this case some events might not be indexed yet and must be retried by filebeat. This potentially slows down indexing. Check your filebeat log files.


(John) #5

I was using 5.1.2 and 5.2.0.
Yes, I replaced "spool_size" with "filebeat.spool_size" and speed improved up to 15000 lines/second!

Thanks a lot!

PS: shouldn't filebeat report an error when I use (nonexistent) "spool_size" rather than "filebeat.spool_size" to avoid such a confusion?


(Steffen Siering) #6

As throughput for beats very much depends on indexing performance in ES, you might wanna play with 'worker', 'spool_size' and 'bulk_max_size' a little and see if you can adapt throughput somewhat more. Do you have publish_async enabled?

PS: shouldn't filebeat report an error when I use (nonexistent) "spool_size" rather than "filebeat.spool_size" to avoid such a confusion?

We're constantly improving configuration loading. See go-ucfg. Unfortunately we can not detect typos or putting configs on the wrong level yet. Related tickets #10, #11, #6.


(John) #7

No, I removed it per your advice above. If I add it back, performance improves slightly.

I see, this is very sad thought understandable.

Thanks for you help!


(Steffen Siering) #8

using publish_async filebeat prepares some batches in memory for sending, that is there is some slight latency-overhead if setting is disabled. I'd keep it disabled if possible. You can also try to increase the spool_size and workers and see if performance still improves somewhat, but maybe you're close to hitting a limit.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.