Why Filebeat returns Unicode character code instead of XML tag symbol under message, it is showing like \u003c and \u003e

I'm new here even for ELK.
Just trying to use Filebeat to collect XML log and push it to Kafka but Filebeat returns Unicode character code instead of XML tag symbol under message, it is showing like \u003c and \u003e.

I'm using filebeat 6.0.0

My XML's logs look like this:
emp

My filebeat.yml looks like this:

Output getting at Kafka like below
{"@timestamp":"2019-06-03T09:08:43.410Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.6.0","topic":"Topic1"},"beat":{"hostname":"LP-5CD812F3VC","version":"6.6.0","name":"LP-5CD812F3VC"},"host":{"name":"LP-5CD812F3VC"},"offset":0,"log":{"file":{"path":"C:\WorkSpace\Filebeat\logs\employees - Copy - Copy (2).xml"},"flags":["multiline"]},"message":"\u003cemployees\u003e\n \u003cemployee id="111"\u003e\n \u003cfirstName\u003eManoj\u003c/firstName\u003e\n \u003clastName\u003eSinha\u003c/lastName\u003e\n \u003clocation\u003eIndia\u003c/location\u003e\n \u003c/employee\u003e\n \u003cemployee id="222"\u003e\n \u003cfirstName\u003eAlex\u003c/firstName\u003e\n \u003clastName\u003eGussin\u003c/lastName\u003e\n \u003clocation\u003eRussia\u003c/location\u003e\n \u003c/employee\u003e\n \u003cemployee id="333"\u003e\n \u003cfirstName\u003eDavid\u003c/firstName\u003e\n \u003clastName\u003eFeezor\u003c/lastName\u003e\n \u003clocation\u003eUSA\u003c/location\u003e\n \u003c/employee\u003e","source":"C:\WorkSpace\Filebeat\logs\employees - Copy - Copy (2).xml","prospector":{"type":"log"},"input":{"type":"log"}}

Can anyone please look into this issue and help me quickly to getting out proper XML tag symbol instead Unicode character code at Filebeat?

Regards, Manoj

Do not use screenshots when sharing text. Please paste your configuration here as text and format it using </>.

Okay... Thanks. configuration as text.

Input XML file/logs

<employees>
    <employee id="111">
        <firstName>Manoj</firstName>
        <lastName>Sinha</lastName>
        <location>India</location>
    </employee>
    <employee id="222">
        <firstName>Alex</firstName>
        <lastName>Gussin</lastName>
        <location>Russia</location>
    </employee>
    <employee id="333">
        <firstName>David</firstName>
        <lastName>Feezor</lastName>
        <location>USA</location>
    </employee>
</employees>

My filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:\WorkSpace\Filebeat\logs\*.xml
    
  input_type: log
  document_type: xml
  encoding: UTF-8
    
  multiline.pattern: '^[[:space:]]'
  multiline.negate: false
  multiline.match: after
  
#----------------------------- Kafka output --------------------------------
output.kafka:
  hosts: ["localhost:9092"]

  topic: "Topic1"

Output getting at Kafka like below
{"@timestamp":"2019-06-03T08:44:16.900Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.6.0","topic":"Topic1"},"offset":0,"log":{"file":{"path":"C:\\WorkSpace\\Filebeat\\logs\\employees - Copy - Copy.xml"},"flags":["multiline"]},"message":"\u003cemployees\u003e\n \u003cemployee id=\"111\"\u003e\n \u003cfirstName\u003eManoj\u003c/firstName\u003e\n \u003clastName\u003eSinha\u003c/lastName\u003e\n \u003clocation\u003eIndia\u003c/location\u003e\n \u003c/employee\u003e\n \u003cemployee id=\"222\"\u003e\n \u003cfirstName\u003eAlex\u003c/firstName\u003e\n \u003clastName\u003eGussin\u003c/lastName\u003e\n \u003clocation\u003eRussia\u003c/location\u003e\n \u003c/employee\u003e\n \u003cemployee id=\"333\"\u003e\n \u003cfirstName\u003eDavid\u003c/firstName\u003e\n \u003clastName\u003eFeezor\u003c/lastName\u003e\n \u003clocation\u003eUSA\u003c/location\u003e\n \u003c/employee\u003e","prospector":{"type":"log"},"input":{"type":"log"},"host":{"name":"LP-5CD812F3VC"},"beat":{"hostname":"LP-5CD812F3VC","version":"6.6.0","name":"LP-5CD812F3VC"},"source":"C:\\WorkSpace\\Filebeat\\logs\\employees - Copy - Copy.xml"}

Can any one please have a look into this issue and help me quickly to getting out proper XML tag symbol instead Unicode character code at Filebeat?
@magnusbaeck , @andrewkroh , @pierhugues @ruflin Can you please help me on this quickly?

Regards, Manoj

Looks like you need to change the UTF encoding since you are on windows...

encoding: "utf-16le"

I tried every possible combination but filebeat always returns Unicode character code instead of XML tag symbol under message. Even I tried on Linux as well but issue persist there as well.

"message":" \u003cGroupVersion\u003e0\u003c/GroupVersion\u003e"}

Any help would be greatly appreciated.. Thanks

what version of beats are you using? I found this thread that talks about filebeat behavior in possibly older versions...

@elastikip, I'm using Filebeat 6.6.0 version. Kindly help me for this issue

{"beat":"filebeat","type":"doc","version":"6.6.0","topic":"Topic1"} 

Regards, Manoj

Can someone look into this issue and help me to figure out. Any help would be greatly appreciated.

@magnusbaeck , @andrewkroh , @pierhugues @ruflin

Many thanks, Manoj

Hi,
@tylerjl, @warkolm, @abdon, @Kosho_Owa, @casper

Can someone look into this issue and help me to figure out. Any help would be greatly appreciated.

Regards, Manoj

Have you tried setting the encoding correctly? The accepted UTF-8 encodings are utf8 or utf-.8.

Newer versions have an output.elasticsearch.escape_html config option that you can set to false. I think this would help. In Filebeat 7.0 this defaults to false, but earlier versions had it enabled by default.

https://www.elastic.co/guide/en/beats/filebeat/6.4/elasticsearch-output.html#_literal_escape_html_literal

Thanks @andrewkroh for your suggestion.

I make it escape_html: false like below but still getting same issue. I'm now for filebeat, can you please let me know if i'm missing anything? Just for your referance i'm
useing Filebeat to collect XML log and push it to Kafka topic.

#----------------------------- Kafka output --------------------------------
output.kafka:
  # initial brokers for reading cluster metadata
  hosts: ["localhost:9092"]

  # message topic selection + partitioning
  topic: "Topic1"
  
  codec.json:
    pretty: true
    escape_html: false

Output logs at Kafka topic:

 message": "\u003cemployees\u003e\n    \u003cemployee id=\"111\"\u003e\n        \u003cfirstName\u003eLokesh\u003c/firstName\u003e\n        \u003clastName\u003eGupta\u003c/lastName\u003e\n        \u003clocation\u003eIndia\u003c/location\u003e\n    \u003c/employee\u003e\n    \u003cemployee id=\"222\"\u003e\n        \u003cfirstName\u003eAlex\u003c/firstName\u003e\n        \u003clastName\u003eGussin\u003c/lastName\u003e\n        \u003clocation\u003eRussia\u003c/location\u003e\n    \u003c/employee\u003e\n    \u003cemployee id=\"333\"\u003e\n        \u003cfirstName\u003eDavid\u003c/firstName\u003e\n        \u003clastName\u003eFeezor\u003c/lastName\u003e\n        \u003clocation\u003eUSA\u003c/location\u003e\n    \u003c/employee\u003e",
  "source": "C:\\WorkSpace\\logs\\employees - Copy.xml",

Seems like the escape_html setting isn't have any effect in 6.x. I tried 7.1.1 and those escape characters went away.

Thanks @andrewkroh!!!

Unicode character issue fixed in the latest version of fileBeat. I tested locally with version 7.0.1 (filebeat-7.0.1-windows-x86_64) with same configuration (v 6.6.0 yml file) and getting expected XML tag at Kafka topic.

But, I have to use filebeat v6.6.0 only and still not able figure out this issue :frowning:

Regards, Manoj

I think opening a bug report on Github is the next step. I think that the escape_html is not being honored and some debugging is required.