Multiline JSON not importing to fields in ElasticSearch - do I need Logstash?

warburtron · May 31, 2019, 9:30am

I've tried really (really) hard to sort this before asking for help here, so I'm desperately hoping someone can help as it's driving me crazy! Fair warning... I'm still pretty new to ELK so there's a good chance I'm missing some basics here.

What I'm trying to do
Import lots of JSON files in to Elasticsearch. Each 'log' entry is a completely separate JSON file and the contents of that file are multi-line. These aren't exactly logs, either... each JSON file is the result of an SSL scan I'm running on internal hosts using the tool SSLyze: https://github.com/nabla-c0d3/sslyze/

The resulting JSON files are large (and probably causing others issues which I'll post separately about), but a heavily trimmed version is as follows:

{
    "accepted_targets": [
        {
            "server_info": {
                "client_auth_credentials": null, 
                "client_auth_requirement": "DISABLED", 
                "highest_ssl_version_supported": "TLSV1_2", 
                "hostname": "test-server.lan", 
                "http_tunneling_settings": null, 
                "ip_address": "1.1.1.1", 
                "openssl_cipher_string_supported": "ECDHE-RSA-AES128-GCM-SHA256", 
                "port": 443, 
                "tls_server_name_indication": "test-server.lan", 
                "tls_wrapped_protocol": "PLAIN_TLS", 
                "xmpp_to_hostname": null
            }
        }
    ], 
    "invalid_targets": [], 
    "sslyze_url": "https://github.com/nabla-c0d3/sslyze", 
    "sslyze_version": "1.4.3", 
    "total_scan_time": "6.27773499489"
}

The problem
Although I can import these JSON files just fine in to Elasticsearch, the multiline JSON document (as you can see above) is imported as one big 'blob' in the message field. See below...

What I want
I need to have the JSON keys show up as fields in Elasticsearch. (In a perfect world I'd select which JSON keys to convert to fields as having every single one would likely be too much). This would allow me to index and easily search and report on, for example, all hosts that were negotiating old ciphers.

What I've tried
I originally had the following set-up: Filebeat > Logstash > Elasticsearch ...although I've now removed Logstash as that wasn't helping me (due my own inexperience) and although the files were making their way in to ES they weren't formatted any better than just going direct from Filebeat to ES.

My current filebeat.yml is as follows (but you can see from the commented-out lines I've tried multiple configs all to no avail:

filebeat.config.modules:
  path: /etc/filebeat/modules.d/*.yml

filebeat.prospectors:
- paths:
    - /home/ubuntu/sslyze/test/*.json
#  document_type: sslscanning
#  json.keys_under_root: true
#  json.add_error_key: false
#  json.message_key: accepted_targets
#  json.overwrite_keys: true
  multiline.pattern: '^{'
  multiline.negate: true
  multiline.match: after
  multiline.max_lines: 5000
  multiline.timeout: 10

processors:
 - decode_json_fields:
     fields: ['message']
     target: ""
     process_array: true
     max_depth: 8
     overwrite_keys: true

output.elasticsearch:
  hosts: ["9.9.9.9:9200"]
  template.name: filebeat
  template.path: filebeat.template.json

I've tried different 'fields' under the decode_json_fields processor, I've tried different targets, I've tried multi-line on and off and various flavours of it with different pattern matches. Nothing I seem to do makes a difference (or, rather, makes it any better.... I've made things plenty worse in my playing around )

FWIW I'm on Filebeat 6.8 and Elasticsearch 6.8

Anything else?
I'm assuming that I'm going about this the correct way, of course. Maybe I should be using Logstash to manipulate the JSON doc? Maybe importing a big JSON blob in to Elasticsearch is okay and I should just be using the tools within Kibana to expose the data I want?

If I need to, I could probably create a simple Python script to reformat the JSON files (if that would help?) but I'm hoping that Filebeat (or Logstash??) will be able to import these in the format I need.

Any help very much appreciated.

kvch · June 3, 2019, 8:19am

With the following configuration I was able to parse the JSON correctly (at least I think so).

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - test.log
  multiline.pattern: '^{'
  multiline.negate: true
  multiline.match: after
  multiline.max_lines: 5000
  multiline.timeout: 10

processors:
- decode_json_fields:
    fields: ['message']
    target: ""
    process_array: true
    max_depth: 8
    overwrite_keys: true
- drop_fields:
    fields: ['message']

Event published:

2019-06-03T10:16:03.349+0200    DEBUG   [processors]    processing/processors.go:183    Publish event: {
  "@timestamp": "2019-06-03T08:15:53.347Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "8.0.0"
  },
  "host": {
    "name": "sleipnir"
  },
  "agent": {
    "hostname": "sleipnir",
    "id": "e2887f41-5f2c-4f5e-b655-e7142da3386c",
    "version": "8.0.0",
    "type": "filebeat",
    "ephemeral_id": "007f541c-0feb-4790-b230-9d143573bba6"
  },
  "log": {
    "offset": 0,
    "file": {
      "path": "/home/n/go/src/github.com/elastic/beats/filebeat/test.log"
    },
    "flags": [
      "multiline"
    ]
  },
  "input": {
    "type": "log"
  },
  "ecs": {
    "version": "1.0.0"
  },
  "accepted_targets": [
    {
      "server_info": {
        "client_auth_credentials": null,
        "ip_address": "1.1.1.1",
        "openssl_cipher_string_supported": "ECDHE-RSA-AES128-GCM-SHA256",
        "tls_server_name_indication": "test-server.lan",
        "tls_wrapped_protocol": "PLAIN_TLS",
        "xmpp_to_hostname": null,
        "client_auth_requirement": "DISABLED",
        "highest_ssl_version_supported": "TLSV1_2",
        "hostname": "test-server.lan",
        "http_tunneling_settings": null,
        "port": 443
      }
    }
  ],
  "invalid_targets": [],
  "sslyze_url": "https://github.com/nabla-c0d3/sslyze",
  "sslyze_version": "1.4.3",
  "total_scan_time": "6.27773499489"
}

Is this what you would like to get as a result?

warburtron · June 4, 2019, 11:24am

Thanks @kvch I'll check this out shortly and get back to you. Thanks for taking the time to look in to it!

warburtron · June 13, 2019, 5:11pm

So I've been playing around with this and still not getting any improvement. I'm ignoring Elasticsearch, for now, and just outputting to file to simplify things.

Essentially, the JSON fields are not being correctly parsed and they are not being inserted in to the top-level of the output file.

If I include drop_fields then my file output is essentially blank (aside from the meta data, such as host, @timestamp, etc). If I don't include the drop_fields part then I still get the 'message' field with the whole payload as one blob:

{  
   "@timestamp":"2019-06-13T17:01:01.075Z",
   "@metadata":{  
      "beat":"filebeat",
      "type":"doc",
      "version":"6.8.0"
   },
   "message":"{\n    \"accepted_targets\": [\n        {\n            \"server_info\": {\n                \"client_auth_credentials\": null, \n                \"client_auth_requirement\": \"DISABLED\", \n                \"highest_ssl_version_supported\": \"TLSV1_2\", \n                \"hostname\": \"server.lan\", \n                \"http_tunneling_settings\": null, \n                \"ip_address\": \"1.1.1.1\", \n                \"openssl_cipher_string_supported\": \"ECDHE-RSA-AES128-GCM-SHA256\", \n                \"port\": 443, \n                \"tls_server_name_indication\": \"f5.com\", \n                \"tls_wrapped_protocol\": \"PLAIN_TLS\", \n                \"xmpp_to_hostname\": null\n            }\n        }\n    ], \n    \"invalid_targets\": [], \n    \"sslyze_url\": \"https://github.com/nabla-c0d3/sslyze\", \n    \"sslyze_version\": \"1.4.3\", \n    \"total_scan_time\": \"6.27773499489\"",
   "source":"/home/ubuntu/sslyze/test/test_scan.json",
   "offset":0,
   "host":{  
      "name":"tlsscanner"
   }
}

I wonder whether the \n line breaks and spaces are causing issues?

I've tried feading Filebeat the same test JSON file but with all of the carriage returns and line feads removed (so everything is on one very long line) but when I do this Filebeat fails to output anything at all.

Any ideas? Do I need to manually sanitise the JSON file before sending it to Filebeat?

Does anyone have a sample JSON file which IS correct processed so I can test it myself to make sure I'm not going entirely mad!?

system · July 11, 2019, 5:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to parse JSON logs through filebeat prospector Beats filebeat	3	2498	January 16, 2018
Input multiline JSON? Beats filebeat	4	3668	February 21, 2019
Parsing Json Logs Beats filebeat	3	2708	April 25, 2018
Parse JSON data with filebeat Beats filebeat	8	60940	April 24, 2017
Json file from filebeat to Logstash and then to elasticsearch Beats filebeat	13	8179	December 27, 2017

Multiline JSON not importing to fields in ElasticSearch - do I need Logstash?

Related topics