I have a log file with two formats in it. The metadata is more like a key value format where the delimiter is '=', the actual data is in a csv format. Both these formats have been bundled into a single file. Can I generate each event as individual document with metadata as part of every document?
This should give you an idea of how to do it. It drops comments and lines that are just whitespace. Then it parses key=value and stashes it in a class variable. Then it parses anything with multiple commas as a csv. That leaves you with odds and ends to handle, such as
The RTC is running 0 hours, 0 mins and 2 secs behind real time.
The flash last updated 1/3/2018 at 2:04
The last .tab file placed 1/3/2018 at 14:01
If you need to get data out of those lines use grok. Make sure you anchor your patterns using ^.
if [message] =~ /^#/ {
drop {}
} else if [message] =~ /^\s*$/ {
drop {}
} else if [message] =~ /^[A-Za-z0-9]+=/ {
ruby {
init => '
@@metadata = {}
'
code => '
msg = event.get("message")
matches = msg.scan(/^([A-Za-z0-9]+)=(.*)/)
m = matches[0]
@@metadata[m[0]] = m[1]
'
}
drop {}
} else if [message] =~ /,.*,.*,/ {
csv {
autodetect_column_names => true
}
ruby {
code => '
event.set("metadata", @@metadata)
'
}
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.