Why "?" character appearance in output message via filebeat input?

I have log files with lines in xml format like that:

<?xml version="1.0" encoding="utf-8"?><UrlTrackingObj xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Url>http://vietnamnet.vn/</Url><Action>view</Action><Ip>123.24.12.229</Ip><Os>Windows 10</Os><Browser>Chrome 58.0</Browser></UrlTrackingObj>
<?xml version="1.0" encoding="utf-8"?><UrlTrackingObj xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Url>http://vietnamnet.vn/</Url><Action>view</Action><Ip>171.249.131.220</Ip><Os>Windows XP</Os><Browser>Chrome 50.3</Browser></UrlTrackingObj>
<?xml version="1.0" encoding="utf-8"?><UrlTrackingObj xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Url>http://vietnamnet.vn/</Url><Action>view</Action><Ip>27.2.64.174</Ip><Os>Windows 7</Os><Browser>Chrome 59.0</Browser></UrlTrackingObj>

then i use Filebeat to read these log files and put to Logstash.
Here Filebeat and Logstash config:

filebeat.prospectors:
- input_type: log
  paths:
    - D:\works\logs\*.log

and:

input {
	beats {
        port => "5043"
    }
}
filter {
  xml {
    source => "message"
    store_xml => false
  }
}
output {
	stdout { codec => json }
}

But the output console like that:

{
    	"@timestamp":"2017-06-29T09:22:44.536Z",
    	"offset":1353334,
    	"@version":"1",
    	"input_type":"log",
    	"beat":{
    		"hostname":"sonnt-pc","name":"sonnt-pc","version":"5.4.3"
    	},
    	"host":"sonnt-pc",
    	"source":"D:\\works\\logs\\2017062901.log",
    	"message":"?<?xml version=\"1.0\" encoding=\"utf-8\"?><UrlTrackingObj xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"><Url>http://vietnamnet.vn/</Url><Action>view</Action><Ip>140.118.138.195</Ip><Os>Windows 8.1</Os><Browser>Firefox 53.0</Browser></UrlTrackingObj>",
    	"type":"log",
    	"tags":["beats_input_codec_plain_applied"]
    }

The problems are:

  1. Why in message return contains "?" at the first character? (But in log line not has it).

  2. I want to the output only like that:

{
	"@timestamp":"2017-06-29T09:22:44.536Z",
	"Url": "http://vietnamnet.vn/",
	"Action": "view",
	"Ip": "140.118.138.195",
	"Os": "Windows 8.1",
	"Browser": "Firefox 53.0",
}

Have you looked at your XML file in a hex editor and verified that the file really begins with <?xml? I suspect the question mark might be a byte order mark.

1 Like

@magnusbaeck: Yes, i've check these file, this line begins with some characters:  in ANSI encode, but in UTF-8 encode not had.
And i want to output json like:

{
	"@timestamp":"2017-06-29T09:22:44.536Z",
	"Url": "http://vietnamnet.vn/",
	"Action": "view",
	"Ip": "140.118.138.195",
	"Os": "Windows 8.1",
	"Browser": "Firefox 53.0",
}

What are the properties that i need to add to the config?

That sounds like byte order marks. You might be able to use a mutate filter's gsub option to remove those bytes. Otherwise use a ruby filter.

Thanks @magnusbaeck. I fixed it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.