Hi there. I have a .csv file in which my @timestamp field is formatted like this:
0043838407D89D6773491A20160918215104.8191+020000
How should I configure Logstash to make it ignore the first part and correctly represent the timestamp?
My configuration file is the following:
input {
file {
path => "path/to/my/file.csv"
type => "test"
start_position => "beginning"
}
}
filter {
csv {
columns => [
"@timestamp",
# ... a lot of other columns ...
]
separator => ","
}
date {
match => [ "@timestamp" , "???" ]
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
}
stdout { codec => rubydebug }
}
Thanks a lot
warkolm
(Mark Walkom)
October 1, 2016, 10:19am
2
Where is the actual date in that?
20160918215104.8191+020000
You see, it's like YYYYMMddHHmmss.ssssZ but I'm unsure it would work and I don't know how to ignore the initial text.
Thanks
Use a grok filter to extract the pieces you're interested in.
You mean using grok instead of csv? Or something else?
Thanks
By all means keep using csv, but use a grok filter to extract the interesting parts of the first field. Something like
grok {
match => {
"@timestamp" => ".*(?<@timestamp>\d{14}\.\d+[+-]\d{4})\d\d$"
}
}
would work. If the number of characters that comes before the timestamp is fixed it's possible to write a more exact pattern that's faster.
Thanks for the kind reply, I'm starting to understand some nice features
I have removed the date filter and added your grok filter, loading the following configuration file:
input {
file {
path => "path/to/my/file.csv"
type => "test"
start_position => "beginning"
}
}
filter {
csv {
columns => [
"@timestamp",
# ... a lot of other columns ...
]
separator => ","
}
grok {
match => {
"@timestamp" => ".*(?<@timestamp>\d+\.\d{4}[+-]\d{4})\d\d$"
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
}
stdout { codec => rubydebug }
}
But unfortunately, now Logstash is returning me this error:
Settings: Default pipeline workers: 2
Pipeline aborted due to error {:exception=>"RegexpError", :backtrace=>["org/jruby/RubyRegexp.java:1434:in `initialize'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-grok-0.11.3/lib/grok-pure.rb:127:in `compile'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.5/lib/logstash/filters/grok.rb:264:in `register'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.5/lib/logstash/filters/grok.rb:259:in `register'", "org/jruby/RubyHash.java:1342:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.5/lib/logstash/filters/grok.rb:255:in `register'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.0-java/lib/logstash/pipeline.rb:182:in `start_workers'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.0-java/lib/logstash/pipeline.rb:182:in `start_workers'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.0-java/lib/logstash/pipeline.rb:136:in `run'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.0-java/lib/logstash/agent.rb:491:in `start_pipeline'"], :level=>:error}
stopping pipeline {:id=>"main"}
Can't understand what's going on
You still need the date filter, and when the @timestamp
field contains something it can parse it'll actually work. But it seems the regexp library doesn't want to capture into a field with @ in it. So let's call it plain timestamp
instead. This works:
filter {
grok {
match => {
"@timestamp" => ".*(?<timestamp>\d{14}\.\d{4}[+-]\d{4})\d\d$"
}
}
date {
match => ["timestamp", "YYYYMMddHHmmss.SSSSZ"]
remove_field => ["timestamp"]
}
}
Yes now the pipeline is working but there is still something wrong with the date:
Failed parsing date from field {:field=>"timestamp", :value=>"0043838407D89D6773491A20160918215104.8191+020000", :exception=>"Invalid format: \"0043838407D89D6773491A20160918215104.8191+...\" is malformed at \"D89D6773491A20160918215104.8191+...\"", :config_parsers=>"YYYYMMddHHmmss.SSSSZ", :config_locale=>"default=en", :level=>:warn}
It works for me so you're doing something differently:
$ cat test.config
input { stdin {} }
output { stdout { codec => rubydebug } }
filter {
grok {
match => {
"message" => ".*(?<timestamp>\d{14}\.\d{4}[+-]\d{4})\d\d$"
}
}
date {
match => ["timestamp", "YYYYMMddHHmmss.SSSSZ"]
remove_field => ["timestamp"]
}
}
$ echo '0043838407D89D6773491A20160918215104.8191+020000' | logstash -f test.config
Settings: Default pipeline workers: 8
Pipeline main started
{
"message" => "0043838407D89D6773491A20160918215104.8191+020000",
"@version" => "1",
"@timestamp" => "2016-09-18T19:51:04.819Z",
"host" => "bertie"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}
1 Like
Got it! I forgot to change the field name inside the grok filter.
Thanks for your kind support, this topic greatly improved my (little) knowledge
1 Like