Logstash CSV Filter: Define Columns from Event Data/Fields

dm90 · January 4, 2019, 10:36pm

Use case:

The S3 input plugin can return AWS CloudFront logs from an S3 bucket. The subsequent events include the message (a tab delimited log line) and a field called "cloudfront_fields" that identifies each field in the log.

It is straightforward to mutate the "cloudfront_fields" into an array of column headers:

{
  "type" => "cloudfront",  
  "cloudfront_version" => "1.0",
  "@version" => "1",
  "@timestamp" => 2019-01-04T22:08:40.319Z,
  "message" => "tab\tdelimited\tlog\entry\t...",
  "cloudfront_fields" => [
     [ 0] "date",
     [ 1] "time",
     [ 2] "x-edge-location",
     [ 3] "sc-bytes",
     [ 4] "c-ip",
     [ 5] "cs-method",
     [ 6] "cs(Host)",
     [ 7] "cs-uri-stem",
     [ 8] "sc-status",
     [ 9] "cs(Referer)",
     [10] "cs(User-Agent)",
     [11] "cs-uri-query",
     [12] "cs(Cookie)",
     [13] "x-edge-result-type",
     [14] "x-edge-request-id",
     [15] "x-host-header",
     [16] "cs-protocol",
     [17] "cs-bytes",
     [18] "time-taken",
     [19] "x-forwarded-for",
     [20] "ssl-protocol",
     [21] "ssl-cipher",
     [22] "x-edge-response-result-type",
     [23] "cs-protocol-version",
     [24] "fle-status",
     [25] "fle-encrypted-fields"
  ]
}

I cannot figure out how to use the [cloudfront_fields] array as the "columns" input for the CSV filter:

input {
  s3 {
    "type" => "cloudfront"
    "bucket" => "${S3_BUCKET}"
    "prefix" => "${S3_BUCKET_PREFIX}"
    "region" => "${S3_BUCKET_REGION}"
    "additional_settings" => {
          "force_path_style" => true
          "follow_redirects" => false
     }
  }
}

filter {
  mutate {
    split => { "cloudfront_fields" => " " }
  }
  csv {
    separator => "\t"
    columns => [cloudfront_fields] # <-- everything I try here doesn't work
    target => "csv"
  }
}

output {
  stdout { codec => "rubydebug" }
}

I've read through Accessing Event Data and Fields in the Configuration and it does not appear to address this use case. (or perhaps this use case deviates from those instructions)

Yes, I could hardcode a grok pattern. But it seems more robust to use the provided event data to map the fields. This will also more-gracefully handle the addition/removal of fields if that should ever occur.

I am also open to suggestions of other filter plugin combinations that would accomplish the same thing and would still be more succinct than writing a ruby script.

Thanks.

wwalker · January 5, 2019, 10:51pm

The CSV plugin assumes the header column is a single string. If you can convert the array into a single string then it should work.

dm90 · January 6, 2019, 4:12pm

Are you referring to the CSV filter plugin? (not the input plugin)

The filter plugin requires an array: https://www.elastic.co/guide/en/logstash/current/plugins-filters-csv.html#plugins-filters-csv-columns

Am I misunderstanding the doc?

(I ended up just using Ruby for this, but still interested in an answer)

system · February 3, 2019, 4:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can I use an existing field in the event to create the columns for a CSV filter? Logstash	3	481	September 5, 2020
Let CSV columns be fields Logstash	5	615	July 30, 2019
Grok csv filter Logstash	14	7036	December 22, 2017
Decode URL Query After use KV Filter Logstash	1	431	October 20, 2019
Array field in event Logstash	6	2702	January 6, 2017

Logstash CSV Filter: Define Columns from Event Data/Fields

Related topics