Many workers whereas only 1 expected

Nicolas_Guyomard · January 3, 2018, 2:46pm

I use aggregate filter plugin to extract metrics from the log of my application using the following config:

input {
   file {
      path => "application.log"
      start_position => "beginning"
     sincedb_path => "/dev/null"
   }
}

filter {
    grok {
        match => {
            "message" => "%{LOG_PATTERN}"
        }
    }

    if ([module] == "input") {
        aggregate {
            task_id => "%{req_id}"
            code => "
                map['in_timestamp'] = event.get('timestamp')
                map['module_time'] = {}
            "
            map_action => "create"
        }
    }
    else if ([module] == "output") {
        aggregate {
            task_id => "%{req_id}"
            code => "
                event.set('module_time', map['module_time'])
                event.set('in_timestamp', map['in_timestamp'])
            "
            map_action => "update"
            end_of_task => true
        }
    }
    else {
        aggregate {
                task_id => "%{req_id}"
                code => "
                    map['module_time'][event.get('module')] = event.get('timestamp')
                "
                map_action => "update"
        }
    }
}

output {
   file {
      path => "application.json"
      codec => "rubydebug"
   }
}

I set Logstash filter workers to 1 both in logstash.yml (pipeline.workers: 1) and command line (-w 1) but it looks like many workers are used.
I see in the output that the timestamp of the event corresponding to line 2016 of my log is before the timestamp of the event corresponding to line 2002 of my log.
As both lines correspond to the same req_id but line 2016 corresponds to output module, I am loosing data.

I see that I can fix the issue for this req_id by changing the value of "pipeline.batch.size" in logstash.yml, but the issue occurs for other req_id.

What am I doing wrong?

How can I check the number of workers used by Logstash ?

Nicolas_Guyomard · January 4, 2018, 1:45pm

The node info API (curl -XGET 'localhost:9600/_node/pipelines?pretty') inform me that there is only 1 worker:

  "pipelines" : {
    "main" : {
      "workers" : 1,
      "batch_size" : 1000,
      "batch_delay" : 10,
      "config_reload_automatic" : false,
      "config_reload_interval" : 3000000000,
      "dead_letter_queue_enabled" : false
    }

So how to explain that the events are interlaced ?

system · February 1, 2018, 1:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregate filter plugin + Logstash	2	269	February 27, 2020
Aggregate plugin - problem with micro-batching Logstash	7	889	May 7, 2018
Specifying Logstash filter workers? Logstash	5	1900	July 5, 2020
Help with Aggregate filter Logstash	4	286	March 25, 2022
Elapsed and aggregate filter with multiple workers Logstash	6	1663	November 1, 2018

Many workers whereas only 1 expected

Related topics