Multiline JSON not importing to fields in ElasticSearch - do I need Logstash?

I've tried really (really) hard to sort this before asking for help here, so I'm desperately hoping someone can help as it's driving me crazy! :slight_smile: Fair warning... I'm still pretty new to ELK so there's a good chance I'm missing some basics here.

What I'm trying to do
Import lots of JSON files in to Elasticsearch. Each 'log' entry is a completely separate JSON file and the contents of that file are multi-line. These aren't exactly logs, either... each JSON file is the result of an SSL scan I'm running on internal hosts using the tool SSLyze: https://github.com/nabla-c0d3/sslyze/

The resulting JSON files are large (and probably causing others issues which I'll post separately about), but a heavily trimmed version is as follows:

{
    "accepted_targets": [
        {
            "server_info": {
                "client_auth_credentials": null, 
                "client_auth_requirement": "DISABLED", 
                "highest_ssl_version_supported": "TLSV1_2", 
                "hostname": "test-server.lan", 
                "http_tunneling_settings": null, 
                "ip_address": "1.1.1.1", 
                "openssl_cipher_string_supported": "ECDHE-RSA-AES128-GCM-SHA256", 
                "port": 443, 
                "tls_server_name_indication": "test-server.lan", 
                "tls_wrapped_protocol": "PLAIN_TLS", 
                "xmpp_to_hostname": null
            }
        }
    ], 
    "invalid_targets": [], 
    "sslyze_url": "https://github.com/nabla-c0d3/sslyze", 
    "sslyze_version": "1.4.3", 
    "total_scan_time": "6.27773499489"
}

The problem
Although I can import these JSON files just fine in to Elasticsearch, the multiline JSON document (as you can see above) is imported as one big 'blob' in the message field. See below...

What I want
I need to have the JSON keys show up as fields in Elasticsearch. (In a perfect world I'd select which JSON keys to convert to fields as having every single one would likely be too much). This would allow me to index and easily search and report on, for example, all hosts that were negotiating old ciphers.

What I've tried
I originally had the following set-up: Filebeat > Logstash > Elasticsearch ...although I've now removed Logstash as that wasn't helping me (due my own inexperience) and although the files were making their way in to ES they weren't formatted any better than just going direct from Filebeat to ES.

My current filebeat.yml is as follows (but you can see from the commented-out lines I've tried multiple configs all to no avail:

filebeat.config.modules:
  path: /etc/filebeat/modules.d/*.yml

filebeat.prospectors:
- paths:
    - /home/ubuntu/sslyze/test/*.json
#  document_type: sslscanning
#  json.keys_under_root: true
#  json.add_error_key: false
#  json.message_key: accepted_targets
#  json.overwrite_keys: true
  multiline.pattern: '^{'
  multiline.negate: true
  multiline.match: after
  multiline.max_lines: 5000
  multiline.timeout: 10

processors:
 - decode_json_fields:
     fields: ['message']
     target: ""
     process_array: true
     max_depth: 8
     overwrite_keys: true

output.elasticsearch:
  hosts: ["9.9.9.9:9200"]
  template.name: filebeat
  template.path: filebeat.template.json

I've tried different 'fields' under the decode_json_fields processor, I've tried different targets, I've tried multi-line on and off and various flavours of it with different pattern matches. Nothing I seem to do makes a difference (or, rather, makes it any better.... I've made things plenty worse in my playing around :smile:)

FWIW I'm on Filebeat 6.8 and Elasticsearch 6.8

Anything else?
I'm assuming that I'm going about this the correct way, of course. Maybe I should be using Logstash to manipulate the JSON doc? Maybe importing a big JSON blob in to Elasticsearch is okay and I should just be using the tools within Kibana to expose the data I want?

If I need to, I could probably create a simple Python script to reformat the JSON files (if that would help?) but I'm hoping that Filebeat (or Logstash??) will be able to import these in the format I need.

Any help very much appreciated.

With the following configuration I was able to parse the JSON correctly (at least I think so).

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - test.log
  multiline.pattern: '^{'
  multiline.negate: true
  multiline.match: after
  multiline.max_lines: 5000
  multiline.timeout: 10

processors:
- decode_json_fields:
    fields: ['message']
    target: ""
    process_array: true
    max_depth: 8
    overwrite_keys: true
- drop_fields:
    fields: ['message']

Event published:

2019-06-03T10:16:03.349+0200    DEBUG   [processors]    processing/processors.go:183    Publish event: {
  "@timestamp": "2019-06-03T08:15:53.347Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "8.0.0"
  },
  "host": {
    "name": "sleipnir"
  },
  "agent": {
    "hostname": "sleipnir",
    "id": "e2887f41-5f2c-4f5e-b655-e7142da3386c",
    "version": "8.0.0",
    "type": "filebeat",
    "ephemeral_id": "007f541c-0feb-4790-b230-9d143573bba6"
  },
  "log": {
    "offset": 0,
    "file": {
      "path": "/home/n/go/src/github.com/elastic/beats/filebeat/test.log"
    },
    "flags": [
      "multiline"
    ]
  },
  "input": {
    "type": "log"
  },
  "ecs": {
    "version": "1.0.0"
  },
  "accepted_targets": [
    {
      "server_info": {
        "client_auth_credentials": null,
        "ip_address": "1.1.1.1",
        "openssl_cipher_string_supported": "ECDHE-RSA-AES128-GCM-SHA256",
        "tls_server_name_indication": "test-server.lan",
        "tls_wrapped_protocol": "PLAIN_TLS",
        "xmpp_to_hostname": null,
        "client_auth_requirement": "DISABLED",
        "highest_ssl_version_supported": "TLSV1_2",
        "hostname": "test-server.lan",
        "http_tunneling_settings": null,
        "port": 443
      }
    }
  ],
  "invalid_targets": [],
  "sslyze_url": "https://github.com/nabla-c0d3/sslyze",
  "sslyze_version": "1.4.3",
  "total_scan_time": "6.27773499489"
}

Is this what you would like to get as a result?

Thanks @kvch I'll check this out shortly and get back to you. Thanks for taking the time to look in to it!

So I've been playing around with this and still not getting any improvement. I'm ignoring Elasticsearch, for now, and just outputting to file to simplify things.

Essentially, the JSON fields are not being correctly parsed and they are not being inserted in to the top-level of the output file.

If I include drop_fields then my file output is essentially blank (aside from the meta data, such as host, @timestamp, etc). If I don't include the drop_fields part then I still get the 'message' field with the whole payload as one blob:

{  
   "@timestamp":"2019-06-13T17:01:01.075Z",
   "@metadata":{  
      "beat":"filebeat",
      "type":"doc",
      "version":"6.8.0"
   },
   "message":"{\n    \"accepted_targets\": [\n        {\n            \"server_info\": {\n                \"client_auth_credentials\": null, \n                \"client_auth_requirement\": \"DISABLED\", \n                \"highest_ssl_version_supported\": \"TLSV1_2\", \n                \"hostname\": \"server.lan\", \n                \"http_tunneling_settings\": null, \n                \"ip_address\": \"1.1.1.1\", \n                \"openssl_cipher_string_supported\": \"ECDHE-RSA-AES128-GCM-SHA256\", \n                \"port\": 443, \n                \"tls_server_name_indication\": \"f5.com\", \n                \"tls_wrapped_protocol\": \"PLAIN_TLS\", \n                \"xmpp_to_hostname\": null\n            }\n        }\n    ], \n    \"invalid_targets\": [], \n    \"sslyze_url\": \"https://github.com/nabla-c0d3/sslyze\", \n    \"sslyze_version\": \"1.4.3\", \n    \"total_scan_time\": \"6.27773499489\"",
   "source":"/home/ubuntu/sslyze/test/test_scan.json",
   "offset":0,
   "host":{  
      "name":"tlsscanner"
   }
}

I wonder whether the \n line breaks and spaces are causing issues?

I've tried feading Filebeat the same test JSON file but with all of the carriage returns and line feads removed (so everything is on one very long line) but when I do this Filebeat fails to output anything at all.

Any ideas? Do I need to manually sanitise the JSON file before sending it to Filebeat?

Does anyone have a sample JSON file which IS correct processed so I can test it myself to make sure I'm not going entirely mad!? :slight_smile: