Filebeat merged 2 "normal" lines

Hello.

In general, our application is writing one-line json log entity, but sometimes it`s multiline json, to merge multiline json i edited filebeat config as follows:

- input_type: log
  paths:
   - /var/log/application/eai-service.json
  json.message_key: message
  json.keys_under_root: true
  json.add_error_key: true
  json.overwrite_keys: true
  multiline:
    pattern: '^\{'
    negate: true
    match: after
  fields:
    tag: mp_eai
  fields_under_root: true

Example log:

{"@timestamp":"2018-07-18T17:15:02.082+04:00","@version":"1","message":"127.0.0.1 - - [2018-07-18T17:15:02.082+04:00] \"GET /admin/tasks?page=0&size=20&sort=created,desc HTTP/1.0\" 200 21655","method":"GET","protocol":"HTTP/1.0","status_code":200,"requested_url":"GET /admin/tasks?page=0&size=20&sort=created,desc HTTP/1.0","requested_uri":"/admin/tasks","remote_host":"127.0.0.1","content_length":21655,"elapsed_time":282}
{"@timestamp":"2018-07-18T17:15:04.269+04:00","@version":"1","message":"127.0.0.1 - - [2018-07-18T17:15:04.269+04:00] \"GET /admin/tasks?page=0&size=20&sort=created,desc HTTP/1.0\" 200 21655","method":"GET","protocol":"HTTP/1.0","status_code":200,"requested_url":"GET /admin/tasks?page=0&size=20&sort=created,desc HTTP/1.0","requested_uri":"/admin/tasks","remote_host":"127.0.0.1","content_length":21655,"elapsed_time":92}

As result, this two log entities has been merged into one, in elasticsearch it looks like:

{
  "_index": "mp_eai-6.3.0-2018.29",
  "_type": "doc",
  "_id": "416HrWQBchTaZXe-Fi1s",
  "_version": 1,
  "_score": null,
  "_source": {
    "content_length": 21655,
    "tag": "mp_eai",
    "requested_uri": "/admin/tasks",
    "status_code": 200,
    "remote_host": "127.0.0.1",
    "protocol": "HTTP/1.0",
    "source": "/var/log/application/eai-service.json",
    "method": "GET",
    "host": {
      "name": "appserver"
    },
    "beat": {
      "name": "server1",
      "hostname": "server1",
      "version": "6.3.0"
    },
    "elapsed_time": 92,
    "@timestamp": "2018-07-18T13:15:04.269Z",
    "offset": 3330035,
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "message": "127.0.0.1 - - [2018-07-18T17:15:02.082+04:00] \"GET /admin/tasks?page=0&size=20&sort=created,desc HTTP/1.0\" 200 21655\n127.0.0.1 - - [2018-07-18T17:15:04.269+04:00] \"GET /admin/tasks?page=0&size=20&sort=created,desc HTTP/1.0\" 200 21655",
    "@version": "1",
    "requested_url": "GET /admin/tasks?page=0&size=20&sort=created,desc HTTP/1.0"
  },
  "fields": {
    "@timestamp": [
      "2018-07-18T13:15:04.269Z"
    ]
  },
  "sort": [
    1531919704269
  ]
}

Logstash output:

output {

if [tag] == "mp_eai" {
  elasticsearch {
    hosts => ["elkserver1:9200", "elkserver2:9200", "elkserver3:9200"]
    user => logstash_internal
    password => testpassword
    sniffing => true
    manage_template => false
    index => "mp_eai-%{[@metadata][version]}-%{+xxxx.ww}"
}
}

  stdout {
      codec => rubydebug
          }
}

Could you please advise, whats wrong?

I'm not sure why the json options are working, but using the processor does.

For example:

filebeat.inputs:
- type: tcp
  host: "localhost:7070"
  multiline:
    pattern: '^\{'
    negate: true
    match: after
  fields:
    tag: mp_eai
  processors:
    - decode_json_fields:
        fields: ["message"]

and running head multi.json | nc localhost 7070 where multi.json includes your sample data.

Hi Andrew, Thanks for reply.

Actually, I`m wondering why the message fields have been merged.
Btw, now I cant reproduce the issue.

Could you please confirm that config I`ve shown will merge multiline, eg.:

{
		"@timestamp":"2018-07-18T17:15:04.269+04:00",
		"@version":"1",
		"message":"127.0.0.1 - - [2018-07-18T17:15:04.269+04:00] \"GET /admin/tasks?page=0&size=20&sort=created,desc HTTP/1.0\" 200 21655","method":"GET","protocol":"HTTP/1.0",
		"status_code":200,
		"requested_url":"GET /admin/tasks?page=0&size=20&sort=created,desc HTTP/1.0",
		"requested_uri":"/admin/tasks",
		"remote_host":
		"127.0.0.1",
		"content_length":21655,
		"elapsed_time":92
}

The reasons the processor works and json does is the order of the execution. Multiline is applied after the json decoder and is part of the reader / harvester. All the processors are only applied after the data has been collected by the harvester so they are applied after multiline.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.