Pipeline workers

Fuzzy66 · December 9, 2016, 4:46am

I am trying to read over 400+ files with logstash and it feels like its not able to keep up with all of them.

I have updated my pipeline workers but nothing seems to help.

Should I break it down into multiple logstash instances or do I grow my pipelines to a huge number? I am using 36 for pipeline workers and batch size of 350.

warkolm · December 9, 2016, 6:15am

What's your config look like?

magnusbaeck · December 9, 2016, 6:39am

What outputs do you have? Are they able to keep up with the inflow?

mrunalgosar · December 9, 2016, 6:53am

Consider this setup. Reading done via Filebeat, If you need some parsing / processing then Feed it into a) Logstash cluster b) Feed it into some Queue (like Kafka, Redis) and then let Logstash read it from that queue. Filebeat is lightweight.

Fuzzy66 · December 9, 2016, 2:15pm

I use a file input with some filtering options and send that to a Redis server that is read from a logstash and put into Elasticsearch.

I have the most processing/memory power on my application server where I run the logstash reading the files so I perform all the filtering in there and just pass the data through into Elasticsearch.

As far as I can tell, my redis server is not backing up with work and the database is empty.

    input {
  file {
    path => ["/path/1/o.RATE*.log", "/path/2/o.RATE*.log", "/path/3/o.RATE*.log", "/path/4/o.RATE*.log"]
    type => "rate_log"
    codec => multiline {
      pattern => "^\w+[-]+"
      negate => true
      what => "previous"
    }
  }

}

filter {
  mutate {
    add_field => {
      "Datacenter" => "DC"
      "Application" => "App"
      "ProcessType" => "RATE"
      "LogType" => "Daemon"

    }
  }
  if [type] == "rate_log" {
    grok {
      match => { "message" => "%{WORD:Level}-*(?<Script>[0-9A-Za-z]+\.[a-z]+):(?<Numbers>[0-9]+):%{DATE_EU:Date} %{TIME:Time} > %{GREEDYDATA:LogMessage}" }
    }
    grok {
      match => { "path" => "%{GREEDYDATA}/%{GREEDYDATA:filename}" }
    }
    grok {
      match => { "path" => "%{GREEDYDATA}/%{GREEDYDATA:group}/%*" }
    }

  }
  if [Level] == "TRACE" { drop {} }
  if [LogMessage] == "Polling for 5000 milliseconds." { drop {} }
}


output {
  redis {
        host => "hostname.com"
        port =>6480
        data_type => "list"
        key => "logs"
  }
}

Fuzzy66 · December 9, 2016, 6:23pm

Even after upping the worker and pipline options.. I still get the following warning.

2016-12-09T11:51:10,475][WARN
][logstash.pipeline ] CAUTION: Recommended inflight events max
exceeded! Logstash will run with up to 204000 events in memory in your
current configuration. If your message sizes are large this may cause
instability with the default heap size. Please consider setting a
non-standard heap size, changing the batch size (currently 2550), or
changing the number of pipeline workers (currently 80)

I currently run a 2gig heap
But what ever I set these values to, I can't ever get the JVM thread count to go up.

"jvm": {
    "threads": {
        "count": 163,
        "peak_count": 164
    },
    "mem": {
        "heap_used_in_bytes": 1761126960,
        "heap_used_percent": 42,
        "heap_committed_in_bytes": 2572869632,
        "heap_max_in_bytes": 4151836672,
        "non_heap_used_in_bytes": 218644432,
        "non_heap_committed_in_bytes": 231473152,
        "pools": {
            "survivor": {
                "peak_used_in_bytes": 131072,
                "used_in_bytes": 261800,
                "peak_max_in_bytes": 71565312,
                "max_in_bytes": 143130624,
                "committed_in_bytes": 262144
            },
            "old": {
                "peak_used_in_bytes": 965532816,
                "used_in_bytes": 1759699264,
                "peak_max_in_bytes": 1431699456,
                "max_in_bytes": 2863398912,
                "committed_in_bytes": 2570510336
            },
            "young": {
                "peak_used_in_bytes": 1048576,
                "used_in_bytes": 1165896,
                "peak_max_in_bytes": 572653568,
                "max_in_bytes": 1145307136,
                "committed_in_bytes": 2097152
            }
        }
    },
    "gc": {
        "collectors": {
            "old": {
                "collection_time_in_millis": 1734,
                "collection_count": 38
            },
            "young": {
                "collection_time_in_millis": 443912,
                "collection_count": 54593
            }
        }
    }

magnusbaeck · December 12, 2016, 6:42am

... and the database is empty.

What do you mean by this? Are events not reaching Elasticsearch at all?

Fuzzy66 · December 13, 2016, 2:20pm

The Elasticsearch is full of data but I don't think its all the data.

When I graph the data via Kibana on # of records per file per logstash @timestamp, I get huge reads per file at one at a time and then it moves to the next file and does not make it back to the first file for over 20 mins or longer. I would expect Logstash to be able to tail all files all at the same time if it has enough memory and pipeline workers.

So I have kept trying to scale up Logstash to handle all files all at the same time but so far I have not been able to see that happening.

system · January 10, 2017, 2:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash reading rate is max 8-9k per second Logstash	5	7587	May 14, 2018
Redis in elk Logstash	8	952	February 26, 2021
Rightsizing elastic batch size and number of workers Logstash	5	510	December 21, 2021
1 huge pipelines vs 2 medium ones Logstash	6	293	August 16, 2022
Filebeat write to Redis so fast that Logstash couldn't keep up reading from Redis Logstash	7	1423	July 6, 2017

Pipeline workers

Related topics