Json Parsing Issue


(William Davis) #1

Given the following config:

input {
    udp {
            type=> "genericJson2"
            port => 9994
    }
}

filter{
    if [type] == "genericJson2" {
            json {
                    source => "message"
            }
    }
}

output {
    elasticsearch{
    }

}

The following input:

{"date":"2018-02-27T13:21:41.3387552-05:00","level":"INFO","appname":"listenercore","logger":"Main","thread":"1","message":"test"}

I get the following result:

{
  "_index": "logstash-2018.02.27",
  "_type": "doc",
  "_id": "AWHYfh_qDl_9h030IXjC",
  "_score": 1,
  "_source": {
    "type": "genericJson2",
    "@timestamp": "2018-02-27T18:19:59.747Z",
    "host": "10.120.4.5",
    "@version": "1",
    "date": "{\"date\":\"2018-02-27T13:20:02.2113",
    "message": "{\"da",
    "logger": "{\"da",
    "thread": "{",
    "level": "{\"da",
    "appname": "{\"date\":\"201"
  },
  "fields": {
    "@timestamp": [
      "2018-02-27T18:19:59.747Z"
    ]
  }
}

What do I need to do to get my json logs parsed correctly?

EDIT

I dug in a little deeper.
running this from the cmd line

sudo bin/logstash -e "input{stdin{type=>stdin}} filter{json {source=>message}} output{ stdout{ codec=>rubydebug } }"

Produced the desired output

{
    "@timestamp" => 2018-02-28T02:07:01.710Z,
          "host" => "Elastisearch01",
       "appname" => "listenercore",
        "logger" => "Main",
      "@version" => "1",
          "type" => "stdin",
          "date" => "2018-02-27T13:21:41.3387552-05:00",
         "level" => "INFO",
        "thread" => "1",
       "message" => "test"
}

So I wrote a quick python udp server to see whats coming across the wire, here is what I captured:

{ " d a t e " : " 2 0 1 8 - 0 2 - 2 7 T 2 1 : 0 6 : 0 4 . 7 4 6 1 3 4 6 - 0 5 : 0 0 " , " l e v e l " : " I N F O " , " a p p n a m e " : " l i s t e n e r c o r e " , " l o g g e r " : " M a i n " , " t h r e a d " : " 1 " ,   m e s s a g e " : " t e s t " }

There are extra spaces in between each character, I'm investigating text encodings but I'm not sure yet.

EDIT

Well pretty much verified this is an encoding issue. If I capture, decode and retransmit the logs with this python script it solves the problem:

import socket

UDP_IP_ADDRESS = "10.254.18.166"
UDP_PORT_NO = 9993
serverSock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
serverSock.bind((UDP_IP_ADDRESS, UDP_PORT_NO))
clientSock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
while True:
    data, addr = serverSock.recvfrom(1024)
    clientSock.sendto(data.decode('utf-16'), ("127.0.0.1", 9994))

How can I get logstash to reach my UTF-16 inputs?
I've tried this and it isnt working:

bin/logstash -e "input{udp{port=>9994 type=>stdin codec=>plain{charset=>'UTF-16'}}} filter{json {source=>message}} output{ stdout{ codec=>rubydebug } }"

(Krunal Kalaria) #2

hi @wjdavis5,

Try KV filter
kv
{
field_split => ","
value_split => ":"
source => "message"
}

Thanks & Regards,
Krunal.


(Makara) #3

You can try with codec => json inside udp { }


(William Davis) #4

can you please explain what this does? - It seems pretty obvious to me that this is an encoding issue. Windows systems use UTF-16, but when I use a codec=>plain{charset=>'UTF-16'} it decodes it all as garbage.

However, if I write a python app to decode the same stream as UTF-16 the text gets decoded perfectly.

I've updated the question to include much more detail.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.