Format JSON line

Hey all,

We have a kafka input that sends a single json line that look like this at regular intervals:

{"load": {"min15": 0.05, "min5": 0.05, "cpucore": 2, "load_warning": 1.0, "min1": 0.01, "load_critical": 5.0, "load_careful": 0.7}, "docker": {}, "uptime": {"seconds": 1289416}, "system": {"os_name": "Linux", "platform": "64bit", "linux_distro": "Red Hat Enterprise Linux Server 7.5", "hostname": "server1.example.com", "hr_name": "Red Hat Enterprise Linux Server 7.5 64bit"}}

Using the kafka input plugin and elasticsearch output the data appears but it is splitting the single json line into multiple documents like this (note the line was split into multiple documents):

{
  "_index": "glances-2018.33",
  "_type": "doc",
  "_id": "MFJkOmUBPBw285gEwadI",
  "_score": 1,
  "_source": {
    "cpucore": 2,
    "load_log": "False",
    "@version": "1",
    "load_careful": 0.7,
    "load_critical": 5,
    "min15": 0.05,
    "history_size": 28800,
    "min5": 0.01,
    "@timestamp": "2018-08-14T21:43:26.171Z",
    "min1": 0,
    "load_warning": 1
  },
  "fields": {
    "@timestamp": [
      "2018-08-14T21:43:26.171Z"
    ]
  }
}

{
  "_index": "glances-2018.33",
  "_type": "doc",
  "_id": "6txkOmUBox3BsJZkwRFJ",
  "_version": 1,
  "_score": null,
  "_source": {
    "hr_name": "Red Hat Enterprise Linux Server 7.5 64bit",
    "@version": "1",
    "history_size": 28800,
    "platform": "64bit",
    "os_name": "Linux",
    "linux_distro": "Red Hat Enterprise Linux Server 7.5",
    "hostname": "server1.example.com",
    "@timestamp": "2018-08-14T21:43:26.173Z",
    "os_version": "3.10.0-862.6.3.el7.x86_64"
  },
  "fields": {
    "@timestamp": [
      "2018-08-14T21:43:26.173Z"
    ]
  },
  "sort": [
    1534283006173
  ]
}

What I'd like is for the json line to be indexed as a single document, but can't figure out how to do that. Any ideas?

Thanks,
Ryan

What does your Logstash configuration look like?

Thanks magnus the config is quite simple:

input {
    kafka {
          codec => "json"
    }
}
output {
        elasticsearch {
            index => "glances-%{+xxxx.ww}"
            hosts => ["servera.example.com", "serverb.example.com"]
            user => "user"
            password => "password"
            ssl => true
        }
}

The JSON blob in your example fed through that Logstash configuration can't possibly have produced those two documents in ES. For example, what did the history_size field in both documents come from? It's not in the alleged source message and it's not added by Logstash.

I would suggest you fire up the console consumer and see what the messages coming out of kafka look like.

/path/to/kafka/bin/kafka-console-consumer.sh \
    --bootstrap-server localhost:9092 \
    --topic logstash

Sorry I trimmed some excess kv's out of the json to keep the line short, but I realize now that complicated things. I'll do as @Badger recommends and post the output shortly.

Ah you were right, the messages coming into kafka were not a single line json blob like I originally thought, each kafka message is a json object exactly how elasticsearch indexed it. Looks like glances changes the json output when using the kafka output instead of file output.

Thank you both for the help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.