CSV files slow input/fitering comparing to python script

Hi I have a 4 columns csv data to be proceed by Logstash and aggregating before sending to Elasticsearch. However, from my testing, I tune the JVM heap size up along with batch size to 50k, batch delay 1m. It stil took 1~2 minutes to go through 1.5 million rows in a 48MB file for aggregation. A simple Python script took only 0.5 seconds. Do I do anything wrong here?
below is my filter config and the input/output is just very simple.

filter {

 csv {
        separator => ","
        skip_header => "true"
        columns => ["id", "abc-id", "azimuth", "elevation"]
        convert => {
          "id" => "integer"
          "azimuth" => "integer"
          "elevation" => "integer"
    }
  }
  aggregate {
    task_id => "%{id}|%{azimuth}|%{elevation}"
    code => "
        map['count'] ||= 0; map['count'] += 1;
        map['id'] = event.get('id')
        map['azimuth'] = event.get('azimuth')
        map['elevation'] = event.get('elevation')
    "
    push_map_as_event_on_timeout => true
    timeout => 120 # You might want to increase this from 2 seconds
    timeout_code => "
        event.set('[@metadata][wanted]', true)
    "
  }
  if ![@metadata][wanted] { drop {} } # Drop the raw events, just keep the aggregates
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.