How should I configure the date filter?

Hi there. I have a .csv file in which my @timestamp field is formatted like this:

0043838407D89D6773491A20160918215104.8191+020000

How should I configure Logstash to make it ignore the first part and correctly represent the timestamp?

My configuration file is the following:

input {
  file {
    path => "path/to/my/file.csv"
    type => "test"
    start_position => "beginning"
  }
}

filter {
  csv {
    columns => [
	    "@timestamp",
#           ... a lot of other columns ...
    ]
    separator => ","
  }
  date {
    match => [ "@timestamp" , "???" ]
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
  }
  stdout { codec => rubydebug }
}

Thanks a lot :slight_smile:

Where is the actual date in that?

20160918215104.8191+020000

You see, it's like YYYYMMddHHmmss.ssssZ but I'm unsure it would work and I don't know how to ignore the initial text.
Thanks

Use a grok filter to extract the pieces you're interested in.

You mean using grok instead of csv? Or something else?

Thanks

By all means keep using csv, but use a grok filter to extract the interesting parts of the first field. Something like

grok {
  match => {
    "@timestamp" => ".*(?<@timestamp>\d{14}\.\d+[+-]\d{4})\d\d$"
  }
}

would work. If the number of characters that comes before the timestamp is fixed it's possible to write a more exact pattern that's faster.

Thanks for the kind reply, I'm starting to understand some nice features :slight_smile:

I have removed the date filter and added your grok filter, loading the following configuration file:

input {
  file {
    path => "path/to/my/file.csv"
    type => "test"
    start_position => "beginning"
  }
}

filter {
  csv {
    columns => [
	    "@timestamp",
#           ... a lot of other columns ...
    ]
    separator => ","
  }
  grok {
    match => {
      "@timestamp" => ".*(?<@timestamp>\d+\.\d{4}[+-]\d{4})\d\d$"
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
  }
  stdout { codec => rubydebug }
}

But unfortunately, now Logstash is returning me this error:

Settings: Default pipeline workers: 2
Pipeline aborted due to error {:exception=>"RegexpError", :backtrace=>["org/jruby/RubyRegexp.java:1434:in `initialize'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-grok-0.11.3/lib/grok-pure.rb:127:in `compile'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.5/lib/logstash/filters/grok.rb:264:in `register'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.5/lib/logstash/filters/grok.rb:259:in `register'", "org/jruby/RubyHash.java:1342:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.5/lib/logstash/filters/grok.rb:255:in `register'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.0-java/lib/logstash/pipeline.rb:182:in `start_workers'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.0-java/lib/logstash/pipeline.rb:182:in `start_workers'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.0-java/lib/logstash/pipeline.rb:136:in `run'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.0-java/lib/logstash/agent.rb:491:in `start_pipeline'"], :level=>:error}
stopping pipeline {:id=>"main"}

Can't understand what's going on :pensive:

You still need the date filter, and when the @timestamp field contains something it can parse it'll actually work. But it seems the regexp library doesn't want to capture into a field with @ in it. So let's call it plain timestamp instead. This works:

filter {
  grok {
    match => {
      "@timestamp" => ".*(?<timestamp>\d{14}\.\d{4}[+-]\d{4})\d\d$"
    }
  }
  date {
    match => ["timestamp", "YYYYMMddHHmmss.SSSSZ"]
    remove_field => ["timestamp"]
  }
}

Yes now the pipeline is working but there is still something wrong with the date:

Failed parsing date from field {:field=>"timestamp", :value=>"0043838407D89D6773491A20160918215104.8191+020000", :exception=>"Invalid format: \"0043838407D89D6773491A20160918215104.8191+...\" is malformed at \"D89D6773491A20160918215104.8191+...\"", :config_parsers=>"YYYYMMddHHmmss.SSSSZ", :config_locale=>"default=en", :level=>:warn}

It works for me so you're doing something differently:

$ cat test.config
input { stdin {} }
output { stdout { codec => rubydebug } }
filter {
  grok {
    match => {
      "message" => ".*(?<timestamp>\d{14}\.\d{4}[+-]\d{4})\d\d$"
    }
  }
  date {
    match => ["timestamp", "YYYYMMddHHmmss.SSSSZ"]
    remove_field => ["timestamp"]
  }
}
$ echo '0043838407D89D6773491A20160918215104.8191+020000' | logstash -f test.config
Settings: Default pipeline workers: 8
Pipeline main started
{
       "message" => "0043838407D89D6773491A20160918215104.8191+020000",
      "@version" => "1",
    "@timestamp" => "2016-09-18T19:51:04.819Z",
          "host" => "bertie"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}
1 Like

Got it! I forgot to change the field name inside the grok filter.
Thanks for your kind support, this topic greatly improved my (little) knowledge :slight_smile:

1 Like