Merge lines in filebeat + logstash

Hello everyone,

It seem to me that filebeat is sending entry to logstash for every line individually . What we want ( if it is possible ) is to club the lines or groups them and then sent them to logstash.

So ... ,

Is it possible to merge lines in Filebeat before sending them to logstash?

Example .

If my log file let say production.log has entry that look like this.

Sending ... 0 .. 2016-02-17 13:20:13 +0530 
Sending ... 1 .. 2016-02-17 13:20:13 +0530 
Sending ... 2 .. 2016-02-17 13:20:14 +0530 
Sending ... 3 .. 2016-02-17 13:20:14 +0530 
Sending ... 4 .. 2016-02-17 13:20:14 +0530 
Sending ... 5 .. 2016-02-17 13:20:15 +0530 
Sending ... 6 .. 2016-02-17 13:20:15 +0530 
Sending ... 7 .. 2016-02-17 13:20:16 +0530 
Sending ... 8 .. 2016-02-17 13:20:16 +0530 
Sending ... 9 .. 2016-02-17 13:20:16 +0530 
Sending ... 10 .. 2016-02-17 13:20:17 +0530 
Sending ... 11 .. 2016-02-17 13:20:17 +0530 
Sending ... 12 .. 2016-02-17 13:20:18 +0530 
Sending ... 13 .. 2016-02-17 13:20:18 +0530 
Sending ... 14 .. 2016-02-17 13:20:18 +0530 
Sending ... 15 .. 2016-02-17 13:20:19 +0530 
Sending ... 16 .. 2016-02-17 13:20:19 +0530 
Sending ... 17 .. 2016-02-17 13:20:20 +0530 
Sending ... 18 .. 2016-02-17 13:20:20 +0530 
Sending ... 19 .. 2016-02-17 13:20:20 +0530 
Sending ... 20 .. 2016-02-17 13:20:21 +0530 
Sending ... 21 .. 2016-02-17 13:20:21 +0530 
Sending ... 22 .. 2016-02-17 13:20:22 +0530 
Sending ... 23 .. 2016-02-17 13:20:22 +0530 
Sending ... 24 .. 2016-02-17 13:20:22 +0530 
Sending ... 25 .. 2016-02-17 13:20:23 +0530 
Sending ... 26 .. 2016-02-17 13:20:23 +0530 
Sending ... 27 .. 2016-02-17 13:20:24 +0530 
Sending ... 28 .. 2016-02-17 13:20:24 +0530 
Sending ... 29 .. 2016-02-17 13:20:24 +0530 
Sending ... 30 .. 2016-02-17 13:20:25 +0530 
Sending ... 31 .. 2016-02-17 13:20:25 +0530 
Sending ... 32 .. 2016-02-17 13:20:26 +0530 
Sending ... 33 .. 2016-02-17 13:20:26 +0530 
Sending ... 34 .. 2016-02-17 13:20:26 +0530 
Sending ... 35 .. 2016-02-17 13:20:27 +0530 
Sending ... 36 .. 2016-02-17 13:20:27 +0530 
Sending ... 37 .. 2016-02-17 13:20:28 +0530 
Sending ... 38 .. 2016-02-17 13:20:28 +0530 
Sending ... 39 .. 2016-02-17 13:20:29 +0530 
Sending ... 40 .. 2016-02-17 13:20:29 +0530 
Sending ... 41 .. 2016-02-17 13:20:30 +0530

Now we want filebeat to group them (better word merge them) and then sent them across to logstash

example : (Unfortunately this does not work )

 ... ... 
     multiline:
         max_lines: 16

So the eventual event that get sent to the logstash/elastic would look like this

1 event (with message as ..) [ formatted the message for readability purposes ]

Sending ... 0 .. 2016-02-17 13:20:13 +0530 
Sending ... 1 .. 2016-02-17 13:20:13 +0530 
Sending ... 2 .. 2016-02-17 13:20:14 +0530 
Sending ... 3 .. 2016-02-17 13:20:14 +0530 
Sending ... 4 .. 2016-02-17 13:20:14 +0530 
Sending ... 5 .. 2016-02-17 13:20:15 +0530 
Sending ... 6 .. 2016-02-17 13:20:15 +0530 
Sending ... 7 .. 2016-02-17 13:20:16 +0530 
Sending ... 8 .. 2016-02-17 13:20:16 +0530 
Sending ... 9 .. 2016-02-17 13:20:16 +0530 
Sending ... 10 .. 2016-02-17 13:20:17 +0530 
Sending ... 11 .. 2016-02-17 13:20:17 +0530 
Sending ... 12 .. 2016-02-17 13:20:18 +0530 
Sending ... 13 .. 2016-02-17 13:20:18 +0530 
Sending ... 14 .. 2016-02-17 13:20:18 +0530 
Sending ... 15 .. 2016-02-17 13:20:19 +0530 

event 2 (with message as ..)

Sending ... 16 .. 2016-02-17 13:20:19 +0530 
Sending ... 17 .. 2016-02-17 13:20:20 +0530 
Sending ... 18 .. 2016-02-17 13:20:20 +0530 
Sending ... 19 .. 2016-02-17 13:20:20 +0530 
Sending ... 20 .. 2016-02-17 13:20:21 +0530 
Sending ... 21 .. 2016-02-17 13:20:21 +0530 
Sending ... 22 .. 2016-02-17 13:20:22 +0530 
Sending ... 23 .. 2016-02-17 13:20:22 +0530 
Sending ... 24 .. 2016-02-17 13:20:22 +0530 
Sending ... 25 .. 2016-02-17 13:20:23 +0530 
Sending ... 26 .. 2016-02-17 13:20:23 +0530 
Sending ... 27 .. 2016-02-17 13:20:24 +0530 
Sending ... 28 .. 2016-02-17 13:20:24 +0530 
Sending ... 29 .. 2016-02-17 13:20:24 +0530 
Sending ... 30 .. 2016-02-17 13:20:25 +0530 
Sending ... 31 .. 2016-02-17 13:20:25 +0530 
Sending ... 32 .. 2016-02-17 13:20:26 +0530 

And so on ...

But unfortunately above configuration(max_lines) doesn't work like I was expecting it too (see attached screenshot)

Here, how my Filebeat config look like

Btw, using Regex for a specific message only is not what I'm looking at from here.

Lastly, the Filebeat version is 1.1.0.

Thanks Everyone.

I think the configuration options you are looking for are spool_size and idle_timeout: https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html#_spool_size Filebeat already automatically sends a bulk request for multiple lines based on the above configurations.

The multiline feature is for things like Java Exceptions where one event covers multiple lines.

I assume you are looking for bulk requests and not merging events?

Multiline support in filebeat requires regex patterns to be configured. The max_lines option ensures one multiline event only having max_lines lines included (additional lines will be dropped).

@ruflin I tried that no luck.

@steffens Agreed but is there a way to do what I'm referring to.

@viren Still not sure if you are looking for multiline or bulk requests?

@ruflin
I can explain.Consider the following entry in my log file policy-router.log

cat log/policy-router.log

Sending 1
Sending 2
Sending 3

The document in the Elasticsearch has messages that look like this

{
  "_index": "policy-router-2016.02.18",
  "_type": "policy-router",
  "_id": "AVL1VlDbqUPAn0bUg_nr",
  "_source": {
    "message": "Sending 1",  <------  Only Single line.
    ... 
   ...
    "beat": {
       .. 
       ..
    },
    "count": 1,
    ...
    ...
}

All I need the File beat to send message(event) like this

{
  "_index": "policy-router-2016.02.18",
  "_type": "policy-router",
  "_id": "AVL1VlDbqUPAn0bUg_nr",
  "_source": {
    "message": "Sending 1\n Sending 2\n Sending 3\n",  <------ Multi line.
    ... 
   ...
    "beat": {
       .. 
       ..
    },
    "count": 1,
    ...
    ...
}

And** I can't use a regex to group those messages** in the logs because the messages in the logs are generally unstructured.

That's all.

Thanks for the details. So we are talking about multiline :slightly_smiling:

The part I didn't understand yet is how filebeat should decide on how many events to combine together? All events so far in the file?

The part I didn't understand yet is how filebeat
should decide on how many events to combine
together? All events so far in the file?

Well I was hoping filebeat would have some configuration to do that.But it seem(is what I understand from your above reply) it doesn't.So is there any way to do the same on logstash then.

I wish, I could use regex for the multiline stuff but unfortunately I can't, since the log are too unstructured.

@ruflin If I use a wild card regex (.+) with max-lines options (25 in my case) I see the filebeat sending across multiline but truncating anything after that.(i.e anything after line no 25 is discarded.)

Which is something I don't want too (i.e some part of the logs getting discarded).

As far as I understand your logic is: Combine 25 lines and send them out. Take the next 25 lines and send them. The content of the 25 lines does not matter?

If the above is the case, could you perhaps change your system that creates the log to create a "start" character after every 25 lines which then could be detected?

@ruflin. Thanks for help.