Logstash aggregate array of objects based on key field

rfferrao · September 28, 2020, 5:51pm

I'm receiving some logs like these:

datetime="2020-09-2020T14:40:00-0300" action="tunnel-up" tunneltype="ssl-tunnel" tunnelid=1171992131 remip=192.168.0.1 user="someuser" group="vpn_ssl_group"

and

datetime="2020-09-2020T15:40:00-0300" action="tunnel-down" tunneltype="ssl-tunnel" tunnelid=1171992131 remip=192.168.0.1 user="someuser" group="vpn_ssl_group"

I'd like to be able to aggregate fields datetime and action grouped by the tunnelid field into an array of objects.

Final document should look like this:

{
    "datetime": "2020-09-2020T14:40:00-0300",
    "tunnelid": 1171992131,
    "user": "someuser",
    "group": "vpn_ssl_group",
    "remip": "192.168.0.1",
    "logs": [{
        "datetime": "2020-09-2020T14:40:00-0300",
        "action": "tunnel-up"
    },
    {
        "datetime": "2020-09-2020T15:40:00-0300",
        "action": "tunnel-down"
    }],
    "duration": 3600
}

Is there any way I can do that? If so, can you please show me how?

Badger · September 28, 2020, 8:23pm

Use an aggregate filter. Look at example 3 in the documentation.

Make sure you have a single pipeline worker thread. Note that when the filter pushes an event based on the map, the only fields on the event will be what you added to the map.

rfferrao · September 28, 2020, 9:57pm

Must I have a single pipeline worker for every configuration file or can I specify the one which has the aggregation filter? If it's possible to split them, can I have a configuration file which keeps the input from the main conf and just adds the aggregation filter and put it in another pipeline?

Badger · September 28, 2020, 11:10pm

The pipeline which contains the aggregate filter must only use a single worker thread. If you need to maintain event order then that applies to other pipelines too.

However, if you are OK with events getting re-ordered, which I expect you are for your use case then other pipelines could have multiple workers.

You can specify the number of workers for each pipeline in pipelines.yml

If the aggregate pipeline feeds the multi-worker pipeline then it is not going to scale well. If the multi-worker pipeline does a lot of work on each event (especially expensive calls like http, geoip, or dns filters) and then feeds the aggregate pipeline it may scale quite well.

rfferrao · September 28, 2020, 11:33pm

Thank you for the feedback, Badger, it's been very helpful! Last question, would you recommend pipeline to pipeline communication for this use case? By the way, I'm currently using tcp as the input and elasticsearch as the output plugins in the single configuration file I have set up.

Badger · September 29, 2020, 12:11am

Test it, measure it. If you cannot see a difference between the two options then choose the simpler one.

system · October 27, 2020, 12:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Aggregate plugin not working Logstash	7	1281	October 27, 2020
Logstash aggregation filter Logstash	7	338	September 9, 2020
Logstash Aggregate plugin is passing by some events Logstash aggregations	4	145	May 1, 2024
Logstash Aggregate Filter output different result each time Logstash	6	441	August 12, 2020
Aggregate multiple in logstash Logstash	4	1249	February 25, 2022

Logstash aggregate array of objects based on key field

Related topics