Parsing array of json objects with logstash and injesting to elastic

Hi,

I am trying to injest data from logstash to elastic, and have this array of json objects, such that each element in the array is a doc in elasticsearch, with the key name as the keys in the json.

    [
    {"name": "bouza", "age": 40, "type": "customer", "credit": "Nil", "date":"2019-10-08T22:52:31-07:00"},
    {"name": "carmen", "age": 20, "type": "customer", "credit": "Nil", "date":"2019-10-09T21:11:01-07:00"},
    {"name": "karen", "age": 31, "type": "customer", "credit": "Nil", "date":"2019-10-08T20:09:16-07:00"},
    {"name": "varmin", "age": 24, "type": "customer", "credit": "Nil", "date":"2019-10-08T12:21:45-07:00"},
    ]

I tried this, but logstash doesnt do anything when i run it:

input {

  file {
    path => "/home/waldo/credit_data/*.json"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => json_lines
}

    }
    output {
      elasticsearch {
        hosts => ["127.0.0.1:9200"]
        index => "credit_data"
      }
    }

Tried both json and json lines. It doesnt seem to do anything. Also I would want the date to be filtered into elastic search as a date field.

I searched for this simple question, and wasnt able to find the answer although many of them have posted this and they have found the solution.

Any help is appreciated. Thankyou so much in advance!

None of the individual lines are valid JSON. The entire array is almost valid JSON (you need to remove the , that precedes the ]). You can read the entire file as a single event using a multiline codec with a pattern that never matches

codec => multiline { pattern => "^Spalanzani" negate => true what => previous auto_flush_interval => 1 multiline_tag => "" }

Then split the array

    split { field => "someField" }
    date { match => [ "[someField][date]", "YYYY-MM-dd'T'HH:mm:ssZZ" ] }

If you need to move the contents of [someField] to the top level you can do it in a ruby filter similar to this.

1 Like

Thanks a lot Badger!

Indeed my array actually does not have the "," in the end. It was a typo.

What does the pattern here mean? "^Spalanzani"?

"^Spalanzani" just means a line starting with the word Spalanzani. That never matches the actual contents of the file, so combined with negate => true it matches every line, so that the entire contents of the file are joined into one event.

2 Likes

Thanks Badger. Got it!

I tried modifying my conf file:

input {

  file {
    path => "/home/waldo/credit_data/test.json"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline { 
	pattern => "^Spalanzani" 
	negate => true 
	what => previous 
	auto_flush_interval => 1 
	multiline_tag => "" 
	}
}

}

filter {
split { 
	field => "someField" 
      }

date { 
	match => [ "[someField][date]", "YYYY-MM-dd'T'HH:mm:ssZZ" ] 
     }
}

output {
  elasticsearch {
    hosts => ["127.0.0.1:9200"]
    index => "credit_data"
  }
}

Heres my json formatted data input:

[
  {
    "date": "2019-10-08T22:52:31-07:00",
    "credit": "Nil",
    "type": "customer",
    "age": 40,
    "name": "bouza"
  },
  {
    "date": "2019-10-09T21:11:01-07:00",
    "credit": "Nil",
    "type": "customer",
    "age": 20,
    "name": "carmen"
  },
  {
    "date": "2019-10-08T20:09:16-07:00",
    "credit": "Nil",
    "type": "customer",
    "age": 31,
    "name": "karen"
  },
  {
    "date": "2019-10-08T12:21:45-07:00",
    "credit": "Nil",
    "type": "customer",
    "age": 24,
    "name": "varmin"
  }
]

However, when I start logstash, I get:

[INFO ] 2019-10-13 22:59:13.702 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
[WARN ] 2019-10-13 22:59:15.457 [[main]>worker16] split - Only String and Array types are splittable. field:someField is of type = NilClass

Do I need to define "someField" ? I was assuming that the key in the json should map to the elastic kv pair.

Thanks for your help so far!

I forgot to mention the json filter

    filter { json { source => "message" target => "someField" remove_field => [ "message" ] } }

Thanks Badger. Unfortunately, it doesnt seem to do anything.
I added the below, but it doesnt ingest anything.

filter {

json {
         source => "message" target => "someField" remove_field => [ "message" ]
     }
split {
        field => "someField"
      }

date {
        match => [ "[someField][date]", "YYYY-MM-dd'T'HH:mm:ssZZ" ]
     }
}

I am unable to explain why that would not work.

Yeah. Thanks Badger.
I ended up formatting my input to be individual json newline seperated events. And configured the filter accordingly. It worked for me.
I will try to find out why the above doesnt work. THanks a lot for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.