CSV Output Plugin - Duplicate Entries

Hi All,

I am using logstash to write to a csv file and elasticsearch. Elasticsearch output is fine but csv output plugin generates entries each time it runs. I have set the following based on the csv filter:

  • skip_headers: true
  • defined column names
  • defined pipeline workers as 1
  • defined java_execution as false

documentation: skip_header: If skip_header is set without autodetect_column_names being set then columns should be set which will result in the skipping of any row that exactly matches the specified column values. Logstash pipeline workers must be set to 1 for this option to work.

Here is the configuration file

input
{
  http_poller
  {
    urls      =>
    {
      mispevents   =>
      {
        method          => post
        url             => "https://hostname.domain.com/events/csv/download"
        headers         =>
        {
          Authorization     => "${MISP_TOKEN}"
          "Content-Type"     => "application/json"
        }
        body            => '{"ignore": "True", "tags": ["con=high"], "type": ["type1"], "last": "400d"}'
      }
    }
  cacert => "/u/elasticStack/cacerts/cacerts.pem"
  schedule => { every => "1m" }
  codec    => "line"
  }
}

filter
{

  csv
  {
    skip_header => "true"
    columns => ["uuid","event_id","category","type","value","comment","to_ids","date","object_uuid","object_name","object_meta_category"]
    add_field => { "priority" => "6"}
  }

  if [type] == "type1" {
     mutate {
        copy => { "value" => "type1" }
     }
     mutate {
        add_field => { "misp_key" => "%{value}" "misp_value" => " %{category},%{comment},%{priority}" }
     }
  }

  mutate {
     copy => { "category" => "name" "event_id" => "eventid" "comment" => "description" }
  }

  mutate {
    remove_field => [ "message", "object_meta_category", "object_name", "@version", "@timestamp", "object_uuid", "date", "to_ids", "event_id" ]
  }
}

output
{

  stdout { codec => rubydebug }

  csv {
     path => "/u/elasticStack/data/misp.csv"
     csv_options => { "col_sep" => ":" }
     fields => ["misp_key","misp_value"]
  }

}

Please let me know if there is anything I am missing

Thanks
Murali

How does that differ to the behaviour that you want?

Hi Badger,

I would prefer the csv file to contain unique entries each time it runs.

Thanks
Murali

That's not how it works. Whatever the URL you are polling every minute is fed to logstash as events. If you want to check whether you have seen the events before you would need a database containing them.

Hi Badger,

Thank you

Murali

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.