I am reading the documentation on the Aggregate Filter, but all examples assume there is something common between the lines (events) to be aggregated. Usually, that common entity is also used as task_id.
But my case is like this:
...
something happened on hostname a.b.c because this and that
...
the action was done by user joe because blah blah
...
From those log lines, I need LogStash to create a single document like
{
"hostname" => "a.b.c"
"user" => "joe"
}
Is there an easy way to do it?
Any tip is more than welcome.
follow up question. Is it possible to add a time out? I am not parsing only those 2 lines I mentioned, but 5 lines. The 5th one, for the current action, may not be yet in the log file. So, I would like to send to the output the data I already have for the first 4 variables if the 5th one is not ready in less than 30 seconds, for example. Is that possible?
I am investigating one possible solution: chaining 2 pipelines. In the first one, I sent to the output (tcp) both the json with the first 4 fields and the json with 5 fields, when possible. I read all of that in the second pipeline (input tcp). I hope I can figure out now how to use the aggregate filter in the second pipeline to spot when I have both the partial and the complete event, and drop the partial one. Let's see...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.