Input multiline JSON?

Jan_Rodriguez_Quabu · January 22, 2019, 6:37pm

Hello!

I have the following JSON file and I send it to an Amazon Elasticsearch Service using FIlebeat, but Elasticsearch isn't capable to index it properly.

{
	"scan": {
		"id_scan": "2019-01-22 19:00:35",
		"files": [
			{"filename": "data_jan/data.txt",
			"findings": [
				{"quote": "Aurora Ramírez", "info_type": "PERSON_NAME", "likelihood": "4"},
				{"quote": "Aurora", "info_type": "FIRST_NAME", "likelihood": "4"},
				{"quote": "Ramírez", "info_type": "LAST_NAME", "likelihood": "4"},
				{"quote": "Aurora Ramírez", "info_type": "FEMALE_NAME", "likelihood": "4"},
				{"quote": "+34 629811498", "info_type": "PHONE_NUMBER", "likelihood": "3"},
				{"quote": "48027218K", "info_type": "SPAIN_NIF_NUMBER", "likelihood": "4"},
				{"quote": "Joan Maragall 11B", "info_type": "PERSON_NAME", "likelihood": "4"},
				{"quote": "Joan", "info_type": "FIRST_NAME", "likelihood": "4"},
				{"quote": "Joan Maragall 11B", "info_type": "FEMALE_NAME", "likelihood": "4"},
				{"quote": "Carrer Joan Maragall 11B, Barcelona, Spain", "info_type": "LOCATION", "likelihood": "3"},
				{"quote": "192.0.13.1", "info_type": "IP_ADDRESS", "likelihood": "4"},
				{"quote": "27-10-2018", "info_type": "DATE", "likelihood": "4"},
				{"quote": "27/10/18", "info_type": "DATE_OF_BIRTH", "likelihood": "5"},
				{"quote": "04/03/1967", "info_type": "DATE_OF_BIRTH", "likelihood": "5"},
				{"quote": "robertlangdon@security.com", "info_type": "EMAIL_ADDRESS", "likelihood": "4"},
				{"quote": "security.com", "info_type": "DOMAIN_NAME", "likelihood": "4"},
				{"quote": "Male", "info_type": "GENDER", "likelihood": "3"},
				{"quote": "ca", "info_type": "LOCATION", "likelihood": "3"},
				{"quote": "10:00pm", "info_type": "TIME", "likelihood": "4"},
				{"quote": "15:34:04", "info_type": "TIME", "likelihood": "4"},
				{"quote": "https://www.hello.com", "info_type": "URL", "likelihood": "4"},
				{"quote": "www.hello.com", "info_type": "DOMAIN_NAME", "likelihood": "4"}
			]
			}
		]
	}
}

This is the configuration of multilines in filebeat.yml, with input type as log :

 ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after

  
  json.keys_under_root: true
  json.add_error_key: true

Kibana only recognises the fields "quote", "info_type" and "likelihood" as '?' fields, not as keywords. The other fields are not even indexed. I think that's because Filebeat only works with one object per line and doesn't allow objects within objects.

I've tried to manually specify the fields, so that Kibana knows the fields and the template:

GET filebeat-6.5.4-2019.01.16/_mapping/field/scan*

Output:

{
      "filebeat-6.5.4-2019.01.16": {
        "mappings": {
          "doc": {
            "scan.id_scan": {
              "full_name": "scan.id_scan",
              "mapping": {
                "id_scan": {
                  "type": "date"
                }
              }
            },
            "scan.files.findings.likelihood": {
              "full_name": "scan.files.findings.likelihood",
              "mapping": {
                "likelihood": {
                  "type": "keyword",
                  "ignore_above": 1024
                }
              }
            },
            "scan.files.filename": {
              "full_name": "scan.files.filename",
              "mapping": {
                "filename": {
                  "type": "keyword",
                  "ignore_above": 1024
                }
              }
            },
            "scan.files.findings.quote": {
              "full_name": "scan.files.findings.quote",
              "mapping": {
                "quote": {
                  "type": "keyword",
                  "ignore_above": 1024
                }
              }
            },
            "scan.files.findings.info_type": {
              "full_name": "scan.files.findings.info_type",
              "mapping": {
                "info_type": {
                  "type": "keyword",
                  "ignore_above": 1024
                }
              }
            }
          }
        }
      }
    }

Any idea of what am I doing wrong? Is it even possible to achieve with or without Logstash?

Many thanks in advance!

steffens · January 23, 2019, 11:39am

Filebeat indeed only supports json events per line. Your multiline config is fully commented out. I can't tell how/why you are able to get and publish events.

Try to avoid objects in arrays. For filebeat it's just an array as filebeat will ship the event, but you won't be able to display them in kibana.

Jan_Rodriguez_Quabu · January 23, 2019, 1:21pm

Thanks! Few questions more:

Does it mean that I can have arrays of values but not arrays of objects? If so, how does Kibana show these arrays and its values?
Is it possible to index a json structure like mine using Logstash?

steffens · January 24, 2019, 1:36pm

From you configuration I assume your logs contain one json document per line.

Filebeat does allow arrays of objects. Filebeat just parses the json (assuming it is complete) as is and send it to Elasticsearch as is. The same for logstash.

You can switch to the console output to see the actual event that would be send to Elasticsearch.

Looks like you are hitting some limitation in Elasticsearch or Kibana.

Elasticsearch will treat your document (assuming filebeat did see the complete json) as nested object.

system · February 21, 2019, 1:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiline JSON not importing to fields in ElasticSearch - do I need Logstash? Beats filebeat	4	4278	July 11, 2019
Parse JSON data with filebeat Beats filebeat	8	60940	April 24, 2017
Filebeat multiline json Beats filebeat	1	1626	July 1, 2022
Filebeat is putting whole JSON object into one field Beats filebeat	2	1332	April 28, 2020
Json of varying length and multiline Logstash	17	1255	August 26, 2019

Input multiline JSON?

Related topics