I am using logstash to parse test logs. Some test log files have multiple errors per file. Only the last error in each file contains the "scenario" information but I need all errors from the same file path to share the same "scenario" field data. I believe I can accomplish this using the mutate and aggregate filters but I am not sure how to implement this solution. My intuition is to use the mutate filter to add the scenario field if it doesn't already exist like this:
You say you want all errors to have the scenario field. I can think of two approaches to that.
The first is to index all the data, then go back and add the scenario to any documents that do not have it. You could do that using logstash. Configure the index with a boolean field called something like scenarioAdded that defaults to false. Run logstash with an elasticsearch input that fetches all records that have a [scenario] field and [scenarioAdded] set to false, then feed those to an http output that makes an update-by-query call to elasticsearch to add [scenario] and set scenarioAdded to true for all documents with the same [log][file][path]. There is an update-by-query example here.
The second is to use aggregate. You say the last error has the scenario, so in the aggregate filter you would need to stash all the errors in the map until you see the last one, then you can delete the map entry. The configuration below is not tested, it is just meant to give you a general idea of what I mean
grok { match => { "message" => "Scenario: %{WORD:scenario}" } }
if ! [scenario] {
aggregate {
task_id => "%{[log][file][path]}"
code => '
# Assumes you only want to save the message field
# If necessary you could go all the way to
# map["errors"] << event.to_hash
map["errors"] ||= []
map["errors"] << event.get("message")
event.cancel
}
} else {
aggregate {
task_id => "%{[log][file][path]}"
code => '
map["errors"] ||= []
map["errors"] << event.get("message")
event.set("errors", map["errors"])
end_of_task => true
timeout => 300
}
# This event has a [scenario] field, so all those
# created by split will have one too
split { field => errors }
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.