I've tried really (really) hard to sort this before asking for help here, so I'm desperately hoping someone can help as it's driving me crazy! Fair warning... I'm still pretty new to ELK so there's a good chance I'm missing some basics here.
What I'm trying to do
Import lots of JSON files in to Elasticsearch. Each 'log' entry is a completely separate JSON file and the contents of that file are multi-line. These aren't exactly logs, either... each JSON file is the result of an SSL scan I'm running on internal hosts using the tool SSLyze: https://github.com/nabla-c0d3/sslyze/
The resulting JSON files are large (and probably causing others issues which I'll post separately about), but a heavily trimmed version is as follows:
{
"accepted_targets": [
{
"server_info": {
"client_auth_credentials": null,
"client_auth_requirement": "DISABLED",
"highest_ssl_version_supported": "TLSV1_2",
"hostname": "test-server.lan",
"http_tunneling_settings": null,
"ip_address": "1.1.1.1",
"openssl_cipher_string_supported": "ECDHE-RSA-AES128-GCM-SHA256",
"port": 443,
"tls_server_name_indication": "test-server.lan",
"tls_wrapped_protocol": "PLAIN_TLS",
"xmpp_to_hostname": null
}
}
],
"invalid_targets": [],
"sslyze_url": "https://github.com/nabla-c0d3/sslyze",
"sslyze_version": "1.4.3",
"total_scan_time": "6.27773499489"
}
The problem
Although I can import these JSON files just fine in to Elasticsearch, the multiline JSON document (as you can see above) is imported as one big 'blob' in the message field. See below...
What I want
I need to have the JSON keys show up as fields in Elasticsearch. (In a perfect world I'd select which JSON keys to convert to fields as having every single one would likely be too much). This would allow me to index and easily search and report on, for example, all hosts that were negotiating old ciphers.
What I've tried
I originally had the following set-up: Filebeat > Logstash > Elasticsearch ...although I've now removed Logstash as that wasn't helping me (due my own inexperience) and although the files were making their way in to ES they weren't formatted any better than just going direct from Filebeat to ES.
My current filebeat.yml is as follows (but you can see from the commented-out lines I've tried multiple configs all to no avail:
filebeat.config.modules:
path: /etc/filebeat/modules.d/*.yml
filebeat.prospectors:
- paths:
- /home/ubuntu/sslyze/test/*.json
# document_type: sslscanning
# json.keys_under_root: true
# json.add_error_key: false
# json.message_key: accepted_targets
# json.overwrite_keys: true
multiline.pattern: '^{'
multiline.negate: true
multiline.match: after
multiline.max_lines: 5000
multiline.timeout: 10
processors:
- decode_json_fields:
fields: ['message']
target: ""
process_array: true
max_depth: 8
overwrite_keys: true
output.elasticsearch:
hosts: ["9.9.9.9:9200"]
template.name: filebeat
template.path: filebeat.template.json
I've tried different 'fields' under the decode_json_fields processor, I've tried different targets, I've tried multi-line on and off and various flavours of it with different pattern matches. Nothing I seem to do makes a difference (or, rather, makes it any better.... I've made things plenty worse in my playing around )
FWIW I'm on Filebeat 6.8 and Elasticsearch 6.8
Anything else?
I'm assuming that I'm going about this the correct way, of course. Maybe I should be using Logstash to manipulate the JSON doc? Maybe importing a big JSON blob in to Elasticsearch is okay and I should just be using the tools within Kibana to expose the data I want?
If I need to, I could probably create a simple Python script to reformat the JSON files (if that would help?) but I'm hoping that Filebeat (or Logstash??) will be able to import these in the format I need.
Any help very much appreciated.