UDP input codec

Laurent_DEGEN · September 6, 2022, 6:24am

Hi,
I have an NB-IOT sensor that outputs UDP packets containing some data.
I used the default UDP input plugin and default plain codec but It does not decode the packet's UDP payload the way I want it to (headers work fine).

Using WIRESHARK, I noticed that the sensor's payload data is sent to logstash as raw HEX and not ASCII-HEX values (contrary to what the other sensors I used).

I tried to find another plugin or codec which does not expect ASCII-HEX values but with no luck...
Do you have any codec or plugin to achieve this ? It seems fairly basic so I'm probably not the first one to have this issue ?

Here is a more detailed version of what is happening:
The UDP payload data the sensor outputs is : 868963044646776002000000044684000000010000000012253103 (HEX values visible on WIRESHARK)
What my input config outputs is :
"\x86\x89c\u0004FFw`\u0002\u0000\u0000\u0000\u0004F\x84\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0000\u0012%1\u0003"
What I would like my input config output to look like :
"868963044646776002000000044684000000010000000012253103" as a string.

Thanks for any help

Best regards,

Laurent

Rios · September 6, 2022, 8:15am

It's related to codec, default is UTF-8. Try to add one of ASCII, ISO-8859-1,US-ASCII, Windows-1252 or CP1252. I have try to convert your sample with several codec but no success. Might be because UTF-8 already covert wrongly.

Add this in your input:

	codec => plain {
		charset => "ASCII"
	}

Badger · September 6, 2022, 3:58pm

I think you will need to do this in two parts. You could try using

codec => plain { charset => "ASCII-8BIT" }

ASCII-8BIT (a.k.a BINARY) will just consume the input in 1 byte pieces. You then want a string representation of it. To do that you will need a ruby filter. Probably a string unpack to convert the string to an array of bytes, then iterate over the array to append each one in hex to a string.

Laurent_DEGEN · September 8, 2022, 2:42am

Hello,
thank you for your reply. I tried implementing Badger's solution and here is my conf :

input {

  udp{
    port => 7979
    tags => ["nbiot","udp"]
    codec => plain { charset => "ASCII-8BIT" }
  }

}

filter {
  if "nbiot" in [tags] {
    ruby {
        code =>'
                stri = event.get("[message]")
                event.set("[o][data][messageSize]", stri.size)
                event.set("[o][data][unpack_H*]", stri.unpack("H*"))
             '
    }
  } 
}

I do not know if this is what you intended me to do @Badger but this is what I understood

The issue is that it does not output what I expect. For example when 0x86 is sent we get 0xefbfbd which seems to match UTF-8 "Replacement character". What I understand is that if that if the input is not recognized as a characher (from the ASCII-8bit table I guess) it is replaced with something else. But again, I only need to get the original binary value (I do not need any character interpretation).

Any idea? I can't believe that I need to create my own codec/plugin for that?

Badger · September 8, 2022, 3:28am

Sorry, I have no idea. I was guessing what you needed to do. Basically you want logstash to consume binary data and it is fundamentally not intended to do that. There may be options that allow it but it is not something I have ever tried.

Rios · September 8, 2022, 3:59am

Similar topic has been opened here

Can you check with tcpdump, which hex characters and visible are send?
Also try to save in a dump file and open in Notepad++ to see the character set. Maybe I'm going in a wrong direction, but I don't see any problem except charset.

Laurent_DEGEN · September 9, 2022, 5:13am

Hi,

Sorry the link you sent is broken.

I managed to solve my problem by modifying my local version of the plain codec (very very dirty) to completely bypass the charset transcoding.
I intend to write a specific codec plugin to properly address the issue (and revert my plain codec to its official version). I tried today but it does not work yet. I intend to open a specific topic about it. I'll try to remember to link the solution once it is done.

Thank you everyone
Best regards,

Laurent

Rios · September 9, 2022, 7:48am

How? By converting hex byte to ASCII? What was difference between 868963044646776002000000044684000000010000000012253103 and what Logstash receive?

system · October 7, 2022, 7:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Input on UDP converting to default UTF-8 but want Hex Strings Logstash	1	348	May 26, 2021
[LOGSTASH] - Plugin input-udp and message charset Logstash	2	370	October 2, 2022
UDP-input Receiving an encoding value � Logstash	8	226	September 19, 2023
Python SocketHandler charset Logstash	4	1677	November 22, 2017
Issue with logstah and caracter encoding Logstash	1	391	May 28, 2020

UDP input codec

Related topics