Json Parsing Issue

Given the following config:

input {
    udp {
            type=> "genericJson2"
            port => 9994
    }
}

filter{
    if [type] == "genericJson2" {
            json {
                    source => "message"
            }
    }
}

output {
    elasticsearch{
    }

}

The following input:

{"date":"2018-02-27T13:21:41.3387552-05:00","level":"INFO","appname":"listenercore","logger":"Main","thread":"1","message":"test"}

I get the following result:

{
  "_index": "logstash-2018.02.27",
  "_type": "doc",
  "_id": "AWHYfh_qDl_9h030IXjC",
  "_score": 1,
  "_source": {
    "type": "genericJson2",
    "@timestamp": "2018-02-27T18:19:59.747Z",
    "host": "10.120.4.5",
    "@version": "1",
    "date": "{\"date\":\"2018-02-27T13:20:02.2113",
    "message": "{\"da",
    "logger": "{\"da",
    "thread": "{",
    "level": "{\"da",
    "appname": "{\"date\":\"201"
  },
  "fields": {
    "@timestamp": [
      "2018-02-27T18:19:59.747Z"
    ]
  }
}

What do I need to do to get my json logs parsed correctly?

EDIT

I dug in a little deeper.
running this from the cmd line

sudo bin/logstash -e "input{stdin{type=>stdin}} filter{json {source=>message}} output{ stdout{ codec=>rubydebug } }"

Produced the desired output

{
    "@timestamp" => 2018-02-28T02:07:01.710Z,
          "host" => "Elastisearch01",
       "appname" => "listenercore",
        "logger" => "Main",
      "@version" => "1",
          "type" => "stdin",
          "date" => "2018-02-27T13:21:41.3387552-05:00",
         "level" => "INFO",
        "thread" => "1",
       "message" => "test"
}

So I wrote a quick python udp server to see whats coming across the wire, here is what I captured:

{ " d a t e " : " 2 0 1 8 - 0 2 - 2 7 T 2 1 : 0 6 : 0 4 . 7 4 6 1 3 4 6 - 0 5 : 0 0 " , " l e v e l " : " I N F O " , " a p p n a m e " : " l i s t e n e r c o r e " , " l o g g e r " : " M a i n " , " t h r e a d " : " 1 " ,   m e s s a g e " : " t e s t " }

There are extra spaces in between each character, I'm investigating text encodings but I'm not sure yet.

EDIT

Well pretty much verified this is an encoding issue. If I capture, decode and retransmit the logs with this python script it solves the problem:

import socket

UDP_IP_ADDRESS = "10.254.18.166"
UDP_PORT_NO = 9993
serverSock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
serverSock.bind((UDP_IP_ADDRESS, UDP_PORT_NO))
clientSock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
while True:
    data, addr = serverSock.recvfrom(1024)
    clientSock.sendto(data.decode('utf-16'), ("127.0.0.1", 9994))

How can I get logstash to reach my UTF-16 inputs?
I've tried this and it isnt working:

bin/logstash -e "input{udp{port=>9994 type=>stdin codec=>plain{charset=>'UTF-16'}}} filter{json {source=>message}} output{ stdout{ codec=>rubydebug } }"

hi @wjdavis5,

Try KV filter
kv
{
field_split => ","
value_split => ":"
source => "message"
}

Thanks & Regards,
Krunal.

You can try with codec => json inside udp { }

can you please explain what this does? - It seems pretty obvious to me that this is an encoding issue. Windows systems use UTF-16, but when I use a codec=>plain{charset=>'UTF-16'} it decodes it all as garbage.

However, if I write a python app to decode the same stream as UTF-16 the text gets decoded perfectly.

I've updated the question to include much more detail.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.