Input multiline JSON?

Hello!

I have the following JSON file and I send it to an Amazon Elasticsearch Service using FIlebeat, but Elasticsearch isn't capable to index it properly.

{
	"scan": {
		"id_scan": "2019-01-22 19:00:35",
		"files": [
			{"filename": "data_jan/data.txt",
			"findings": [
				{"quote": "Aurora Ramírez", "info_type": "PERSON_NAME", "likelihood": "4"},
				{"quote": "Aurora", "info_type": "FIRST_NAME", "likelihood": "4"},
				{"quote": "Ramírez", "info_type": "LAST_NAME", "likelihood": "4"},
				{"quote": "Aurora Ramírez", "info_type": "FEMALE_NAME", "likelihood": "4"},
				{"quote": "+34 629811498", "info_type": "PHONE_NUMBER", "likelihood": "3"},
				{"quote": "48027218K", "info_type": "SPAIN_NIF_NUMBER", "likelihood": "4"},
				{"quote": "Joan Maragall 11B", "info_type": "PERSON_NAME", "likelihood": "4"},
				{"quote": "Joan", "info_type": "FIRST_NAME", "likelihood": "4"},
				{"quote": "Joan Maragall 11B", "info_type": "FEMALE_NAME", "likelihood": "4"},
				{"quote": "Carrer Joan Maragall 11B, Barcelona, Spain", "info_type": "LOCATION", "likelihood": "3"},
				{"quote": "192.0.13.1", "info_type": "IP_ADDRESS", "likelihood": "4"},
				{"quote": "27-10-2018", "info_type": "DATE", "likelihood": "4"},
				{"quote": "27/10/18", "info_type": "DATE_OF_BIRTH", "likelihood": "5"},
				{"quote": "04/03/1967", "info_type": "DATE_OF_BIRTH", "likelihood": "5"},
				{"quote": "robertlangdon@security.com", "info_type": "EMAIL_ADDRESS", "likelihood": "4"},
				{"quote": "security.com", "info_type": "DOMAIN_NAME", "likelihood": "4"},
				{"quote": "Male", "info_type": "GENDER", "likelihood": "3"},
				{"quote": "ca", "info_type": "LOCATION", "likelihood": "3"},
				{"quote": "10:00pm", "info_type": "TIME", "likelihood": "4"},
				{"quote": "15:34:04", "info_type": "TIME", "likelihood": "4"},
				{"quote": "https://www.hello.com", "info_type": "URL", "likelihood": "4"},
				{"quote": "www.hello.com", "info_type": "DOMAIN_NAME", "likelihood": "4"}
			]
			}
		]
	}
}

This is the configuration of multilines in filebeat.yml, with input type as log :

 ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after

  
  json.keys_under_root: true
  json.add_error_key: true

Kibana only recognises the fields "quote", "info_type" and "likelihood" as '?' fields, not as keywords. The other fields are not even indexed. I think that's because Filebeat only works with one object per line and doesn't allow objects within objects.

I've tried to manually specify the fields, so that Kibana knows the fields and the template:

GET filebeat-6.5.4-2019.01.16/_mapping/field/scan*

Output:

{
      "filebeat-6.5.4-2019.01.16": {
        "mappings": {
          "doc": {
            "scan.id_scan": {
              "full_name": "scan.id_scan",
              "mapping": {
                "id_scan": {
                  "type": "date"
                }
              }
            },
            "scan.files.findings.likelihood": {
              "full_name": "scan.files.findings.likelihood",
              "mapping": {
                "likelihood": {
                  "type": "keyword",
                  "ignore_above": 1024
                }
              }
            },
            "scan.files.filename": {
              "full_name": "scan.files.filename",
              "mapping": {
                "filename": {
                  "type": "keyword",
                  "ignore_above": 1024
                }
              }
            },
            "scan.files.findings.quote": {
              "full_name": "scan.files.findings.quote",
              "mapping": {
                "quote": {
                  "type": "keyword",
                  "ignore_above": 1024
                }
              }
            },
            "scan.files.findings.info_type": {
              "full_name": "scan.files.findings.info_type",
              "mapping": {
                "info_type": {
                  "type": "keyword",
                  "ignore_above": 1024
                }
              }
            }
          }
        }
      }
    }

Any idea of what am I doing wrong? Is it even possible to achieve with or without Logstash?

Many thanks in advance!

Filebeat indeed only supports json events per line. Your multiline config is fully commented out. I can't tell how/why you are able to get and publish events.

Try to avoid objects in arrays. For filebeat it's just an array as filebeat will ship the event, but you won't be able to display them in kibana.

Thanks! Few questions more:

  1. Does it mean that I can have arrays of values but not arrays of objects? If so, how does Kibana show these arrays and its values?

  2. Is it possible to index a json structure like mine using Logstash?

From you configuration I assume your logs contain one json document per line.

Filebeat does allow arrays of objects. Filebeat just parses the json (assuming it is complete) as is and send it to Elasticsearch as is. The same for logstash.

You can switch to the console output to see the actual event that would be send to Elasticsearch.

Looks like you are hitting some limitation in Elasticsearch or Kibana.

Elasticsearch will treat your document (assuming filebeat did see the complete json) as nested object.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.