Logstash Simple JSON


(Aloysius Paredes) #1

Trying to parse the following JSON file with Logstash:

{
  "glossary":
  {
    "title": "example glossary",
    "GlossDiv":
    {
      "title": "S",
      "GlossList":
      {
        "GlossEntry":
        {
          "ID": "SGML",
          "SortAs": "SGML",
          "GlossTerm": "Standard Generalized Markup Language",
          "Acronym": "SGML",
          "Abbrev": "ISO 8879:1986",
          "GlossDef":
          {
            "para": "A meta-markup language, used to create markup languages such as DocBook.",
            "GlossSeeAlso": ["GML", "XML"]
          },
          "GlossSee": "markup"
        }
      }
    }
  }
}

With the following Logstash config file:

input {
  file {
    type => "json"
    path => "/home/user/Elastic/logstash-6.2.3/data/queue/testJSON.json"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
  json {
    source => "glossary"
  }
}

output {
  stdout { codec => rubydebug }
  elasticsearch{
	hosts => ["localhost:9200"]
	user => elastic
    password => changeme
    index => "test_index1"
    document_type => "json"
  }
}

Output in Kibana Dev Tools:

Logstash isn't parsing it correctly. It is parsing the JSON line-by-line, but I want the entire JSON file to be one document. What am I doing wrong? I want each JSON file to be it's own event while maintaining the JSON's data structure in Elasticsearch.


#2

The use case of reading an entire file as a single event is something logstash does not handle well. You could do it using this, but you need to kill logstash after it reads the file.

input {
  file {
    path => "/.../test.json"
    codec => multiline {
      pattern => "^"
      negate => false
      what => "previous"
      auto_flush_interval => 2
    }
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}
filter { json { source => "message" } }

(Aloysius Paredes) #3

@Badger What would be the best way to ingest multiple JSON files into Elasticsearch? (Where each JSON file is a single event)


#4

Starting a new instance of logstash for each file is really expensive. The above will process one file, and you could have a script to kill it. Or you could use a stdin input (provided the file is less than 16 KB), and that terminate logstash on EOF. To use a single instance of logstash you could may go through kafka or an http input.

But if what you really need to do is to ingest JSON file into Elasticsearch, I would bypass logstash and do it using /bin/sh and curl.


#5

I wanted to know how this would work with an http input. It is very simple to configure. On the logstash side I just have

input { http { host => "127.4.31.9" port => 4000 } }
output { stdout { codec => rubydebug } }

and I can send a directory full of files to it using

for F in /etc/logstash/t.httpInput/data/*.xml ; do
  echo Processing $F
  curl -H 'content-type: application/xml' -XPUT 'http://127.4.31.9:4000/' -d "@$F"
  echo ''
done

I though I was going to have to configure a multiline code, but no, each PUT results in a single event.


Logstash vs Spark vs something else
(Aloysius Paredes) #6

Fixed:

Used the following Logstash config file:

input {
  file {
    type => "json"
    path => "/home/prism-dev/Elastic/logstash-6.2.3/data/queue/*.json"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => "json"
  }
}

filter {
  json {
    source => "message"
  }
}

output {
  stdout { codec => rubydebug }

  elasticsearch{
		hosts => ["localhost:9200"]
		user => elastic
		password => changeme
    index => "testindex"
    document_type => "json"
	}
}

@Badger This works for reading multiple JSON files where each JSON file holds 1 event.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.