Management of illegal characters

Francesco_Esposito · April 15, 2025, 4:36pm

Hello Everyone,
I am in the process of converting a "in-house" Delphi application that receives an XML string over udp port and send it to an elasticsearch instance, doing a conversion from XML to JSON and sending it thru _bulk api.

Now, I am trying to receive an XML string over UPD Input Plugin, and happens that this XML string contains illegal characters, like #0 (Nul string termination).
I am using this configuration:

input {
	udp {
		port => 517
	}
}
filter {
	xml {
		force_array => false
		source => "message"
		target => "myxml"
	}
}
output {
	file {
		path => "/log_streaming/my_app/records/log-%{+yyyy-MM-dd_HH.mm.ss.SSS}.log"	
		codec => line { format => "%{myxml}" }
	}
}

And when everything is good I can receive data in this format:

{
    "APPVERSION": "1.0.1.11",
    "EVENTDATETIME": "04/14/2025 18:38:20:203",
    "EVENTNAME": "TestEvent\n2\n04/14/2025 18:38:20:38",
    "APPLICATION": "TESTUDPLOGGER",
    "HOST": "FRANCESCOE-RMT",
    "EVENTINFO": "04/14/2025 18:38:20:38",
    "LINENO": "1",
    "INSTANCEID": "BD2051FB-525E-49CD-BEDB-3DEF967ADCFB",
    "SEVERITY": "0",
    "THREADID": "13852",
    "EVENTSEQNO": "1"
}

But if a #0 is received, I have this error:

Illegal character "\u0000" in raw string "04/15/2025 11:55:55:55\u0000ben\u0000frank\u0000sue"

Writing to the disc what I receive (removing "filter" part) I receive:

{"@timestamp":"2025-04-14T17:52:37.590740500Z","event":{"original":"<EVENT><HOST>FRANCESCOE-RMT</HOST><INSTANCEID>65C3FEC1-B288-437F-B0C3-8CA3EB1956EC</INSTANCEID><APPLICATION>TESTUDPLOGGER</APPLICATION><THREADID>7080</THREADID><APPVERSION>1.0.1.11</APPVERSION><LINENO>1</LINENO><EVENTSEQNO>1</EVENTSEQNO><EVENTDATETIME>04/14/2025 13:52:37:587</EVENTDATETIME><SEVERITY>0</SEVERITY><EVENTNAME>TestEvent\r\n1\r\n04/14/2025 13:52:37:52</EVENTNAME><EVENTINFO>04/14/2025 13:52:37:52\u0000ben\u0000frank\u0000sue</EVENTINFO></EVENT>"},"host":{"ip":"127.0.0.1"},"@version":"1","message":"<EVENT><HOST>FRANCESCOE-RMT</HOST><INSTANCEID>65C3FEC1-B288-437F-B0C3-8CA3EB1956EC</INSTANCEID><APPLICATION>TESTUDPLOGGER</APPLICATION><THREADID>7080</THREADID><APPVERSION>1.0.1.11</APPVERSION><LINENO>1</LINENO><EVENTSEQNO>1</EVENTSEQNO><EVENTDATETIME>04/14/2025 13:52:37:587</EVENTDATETIME><SEVERITY>0</SEVERITY><EVENTNAME>TestEvent\r\n1\r\n04/14/2025 13:52:37:52</EVENTNAME><EVENTINFO>04/14/2025 13:52:37:52\u0000ben\u0000frank\u0000sue</EVENTINFO></EVENT>"}

As you can see, it was converted in "\u0000". I need to convert #0, #13#10, #13 and #10 to a one character space. How can I do that?

Francesco_Esposito · April 15, 2025, 4:54pm

Sorry everyone, it was simply:

input {
	udp {
		port => 517
	}
}
filter {
	mutate { gsub => [ "message", "\u0000", "[0x00]" ] }
	mutate { gsub => [ "message", "\r\n", "[0x01]" ] }
	mutate { gsub => [ "message", "\r", "[0x02]" ] }
	mutate { gsub => [ "message", "\n", "[0x03]" ] }
	xml {
		force_array => false
		source => "message"
		target => "myxml"
	}
}
output {
	file {
		path => "/log_streaming/my_app/records/log-%{+yyyy-MM-dd_HH.mm.ss.SSS}.log"	
		codec => line { format => "%{myxml}" }
	}
}

Badger · April 15, 2025, 5:26pm

mutate { gsub => [ "message" , "[\r\n^@]", " " ] }

That ^@ is a literal NUL character. In vim I can type it using Ctrl/v Ctrl/Shift/2

I then get

        "EVENTNAME" => "TestEvent1 04/14/2025 14:11:24:11",
        "EVENTINFO" => "04/14/2025 14:11:24:11 ben frank sue"

using the example from your other thread.

Francesco_Esposito · April 15, 2025, 7:08pm

Okay, that's good @Badger . I have used my formatting to remain consistent with what I had before in my delphi application, but your solution works.

{
    "APPVERSION": "1.0.1.12",
    "EVENTDATETIME": "04/15/2025 15:00:41:955",
    "EVENTNAME": "TestEvent[0x01]9[0x01]04/15/2025 15:00:41:00",
    "APPLICATION": "TESTUDPLOGGER",
    "EVENTINFO": "04/15/2025 15:00:41:00[0x00]ben[0x01]frank[0x02]sue[0x03]john",
    "HOST": "FRANCESCOE-RMT",
    "LINENO": "1",
    "INSTANCEID": "CA61547F-3154-48A9-A50D-E650D9243F8E",
    "SEVERITY": "0",
    "THREADID": "16912",
    "EVENTSEQNO": "1"
}

This is my file now.

If I want to send it to elasticsearch, shouldn't be enough to activate elasticsearch plugin like this?

input {
	udp {
		port => 517
	}
}
filter {
	mutate { gsub => [ "message", "\u0000", "[0x00]" ] }
	mutate { gsub => [ "message", "\r\n", "[0x01]" ] }
	mutate { gsub => [ "message", "\r", "[0x02]" ] }
	mutate { gsub => [ "message", "\n", "[0x03]" ] }
	xml {
		force_array => false
		source => "message"
		target => "myxml"
	}
}
output {
	file {
		path => "/log_streaming/my_app/records/log-%{+yyyy-MM-dd_HH.mm.ss.SSS}.log"
		codec => line { format => "%{myxml}" }
	}
	elasticsearch {
		hosts => "https://myhost:9200"
		user => "myuser"
		password => "mypassword"
		ssl_certificate_verification => "false"
		index => "delphi-processlog_write"
		codec => line { format => "%{myxml}" }
	}
}

Because I see my data inserted in fields called "myxml.my fields".

Badger · April 15, 2025, 7:46pm

Every output allows you to specify the codec option because it is implemented by the base codec class that they extend. But that doesn't mean they use the codec to send the event to the destination.

The elasticsearch output ignores the codec option and formats the event as a JSON string, because that's what the _bulk API requires.

If you use store_xml you have to specify a target, if you want to move the fields back up to the top-level of the event then see this thread.

Francesco_Esposito · April 15, 2025, 8:35pm

It works perfectly, thanks! Last question @Badger , I receive a date in this format:

04/15/2025 16:28:09:361

That gives me this error:

Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"delphi-processlog_write", :routing=>nil}, {"eventseqno"=>"1", "exeversion"=>"1.0.1.12", "severity"=>"0", "@timestamp"=>2025-04-15T20:28:09.363306700Z, "apphost"=>"FRANCESCOE-RMT", "exename"=>"TESTUDPLOGGER", "eventname"=>"TestEvent[0x01]5[0x01]04/15/2025 16:28:09:28", "logeventdate"=>"04/15/2025 16:28:09:361", "eventinfo"=>"04/15/2025 16:28:09:28[0x00]ben[0x01]frank[0x02]sue[0x03]john", "instanceid"=>"0518D8D7-28CD-4AF3-99DC-8D4CE30CCC6E", "threadid"=>"23476"}], :response=>{"index"=>{"status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [logeventdate] of type [date] in document with id 'GkcjO5YBP4xpwjYARoME'. Preview of field's value: '04/15/2025 16:28:09:361'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [04/15/2025 16:28:09:361] with format [yyyy/MM/dd HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss.SSS||strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"date_time_parse_exception: Failed to parse with all enclosed parsers"}}}}}}

How would you make elasticsearch accept this format or how to convert it?

Badger · April 15, 2025, 9:52pm

Your elasticsearch index is configured to expect a logeventdate in one of the formats

yyyy/MM/dd HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss.SSS||strict_date_optional_time||epoch_millis

Your field actually has the format yyyy/MM/dd HH:mm:ss:SSS. I don't know if you can add an additional date format to an existing index (obviously you can update the index template so that future indexes will accept that format).

Otherwise change the colon before the milliseconds to a stop using

    mutate { gsub => [ "EVENTDATETIME", "(\d{2}):(\d{3})$", "\1.\2" ] }

Francesco_Esposito · April 15, 2025, 10:08pm

Unfortunately, my format looks like being

MM/dd/yyyy HH:mm:ss:SSS

So that's not enough...

Badger · April 15, 2025, 11:46pm

You could parse it using a date filter. I believe that will get sent to elasticsearch in an acceptable format.

If not, you can reformat it using ruby. See this thread.

You could also reformat it using a more complex gsub, but that doesn't feel right to me.

mutate { gsub => [ "someField", "(\d{2})/(\d{2})/(\d{4}) (\d{2}:\d{2}:\d{2}):(\d{3})", "\3/\1/\2 \4.\5" ] }

Just looking at that makes my eyes bleed!

Francesco_Esposito · April 30, 2025, 2:56pm

At the end I have used the date filter plugin:

	date {
		match => [ "logeventdate", "MM/dd/yyyy HH:mm:ss:SSS" ]
		timezone => "Etc/GMT+4"
		target => "logeventdate"
	}

It worked just fine.

Topic		Replies	Views
XML to Elasticsearch via Logstash Logstash	47	3561	April 3, 2018
Help! Logstash send to Elasticsearch use XML file or JSON file Logstash	9	320	September 27, 2023
Parse a json file that includes an xml Logstash	8	548	December 16, 2022
Logstash parshing unicode characters Logstash	17	3414	March 23, 2022
Elasticsearch Logstash Error Logstash	2	585	June 13, 2017

Management of illegal characters

Related topics