Error parsing xml with XmlSimple: Bad encoding name

Hi guys, I've been having this trouble with Logstash v7.8.1 and v6.5.4 with EPO Syslog. It's a straight simple XML data, but the XML plugin could not parse it somehow.

Sending Logstash logs to /home/XXX/Desktop/logstash-6.5.4/logs which is now configured via log4j2.properties
[2020-08-16T21:27:29,882][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2020-08-16T21:27:29,895][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.5.4"}
[2020-08-16T21:27:33,661][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2020-08-16T21:27:34,444][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x74dd2afc run>"}
[2020-08-16T21:27:34,505][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2020-08-16T21:27:34,527][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2020-08-16T21:27:34,863][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2020-08-16T21:27:35,629][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"message", :value=>"<?xml version=\"\"1.0\"\" encoding=\"\"utf-8\"\"?><UpdateEvents><MachineInfo><AgentGUID>{00000000-0000-0000-0000-000000000000}</AgentGUID><MachineName>XXXXXXXX</MachineName><RawMACAddress>XXXXXXXX</RawMACAddress><IPAddress>XXXXXXXXX</IPAddress><AgentVersion>XXXXXXXX</AgentVersion><OSName>XXXXXXXX</OSName><TimeZoneBias>XXXXXXXXX</TimeZoneBias><UserName>XXXXXXXXX</UserName></MachineInfo><McAfeeCommonUpdater ProductName=\"\"McAfee Agent\"\" ProductVersion=\"\"5.0.0\"\" ProductFamily=\"\"TVD\"\"><UpdateEvent><EventID>XXXXXXXXX</EventID><Severity>XXXXXXXXX</Severity><GMTTime>XXXXXXXXX</GMTTime><ProductID>XXXXXXXXX</ProductID><Locale>XXXXXXXXX</Locale><Error>XXXXXXXXX</Error><Type>XXXXXXXXX</Type><Version>XXXXXXXXX</Version><InitiatorID>XXXXXXXXX</InitiatorID><InitiatorType>XXXXXXXXX</InitiatorType><SiteName>eXXXXXXXXX</SiteName></UpdateEvent></McAfeeCommonUpdater></UpdateEvents>", :exception=>#<REXML::ParseException: #<ArgumentError: Bad encoding name >
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/encoding.rb:13:in `encoding='
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/source.rb:57:in `encoding='
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:218:in `pull_event'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:185:in `pull'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:23:in `parse'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in `build'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in `initialize'
/home/xxx/Desktop/logstash-6.5.4/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in `parse'
/home/xxx/Desktop/logstash-6.5.4/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in `xml_in'
/home/xxx/Desktop/logstash-6.5.4/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in `xml_in'
/home/xxx/Desktop/logstash-6.5.4/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.6/lib/logstash/filters/xml.rb:190:in `filter'
/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/filters/base.rb:143:in `do_filter'
/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/filters/base.rb:162:in `block in multi_filter'
org/jruby/RubyArray.java:1734:in `each'
/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/filters/base.rb:159:in `multi_filter'
/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/filter_delegator.rb:44:in `multi_filter'
(eval):58:in `block in initialize'
org/jruby/RubyArray.java:1734:in `each'
(eval):55:in `block in initialize'
(eval):42:in `block in filter_func'
/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/pipeline.rb:341:in `filter_batch'
/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/pipeline.rb:320:in `worker_loop'
/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/pipeline.rb:286:in `block in start_workers'
...
Bad encoding name 
Line: 1
Position: 866
Last 80 unconsumed characters:
>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:96:in `parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in `build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in `initialize'", "/home/xxx/Desktop/logstash-6.5.4/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in `parse'", "/home/xxx/Desktop/logstash-6.5.4/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in `xml_in'", "/home/xxx/Desktop/logstash-6.5.4/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in `xml_in'", "/home/xxx/Desktop/logstash-6.5.4/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.6/lib/logstash/filters/xml.rb:190:in `filter'", "/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/filters/base.rb:143:in `do_filter'", "/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/filters/base.rb:162:in `block in multi_filter'", "org/jruby/RubyArray.java:1734:in `each'", "/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/filters/base.rb:159:in `multi_filter'", "/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/filter_delegator.rb:44:in `multi_filter'", "(eval):58:in `block in initialize'", "org/jruby/RubyArray.java:1734:in `each'", "(eval):55:in `block in initialize'", "(eval):42:in `block in filter_func'", "/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/pipeline.rb:341:in `filter_batch'", "/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/pipeline.rb:320:in `worker_loop'", "/home/xxx/Desktop/logstash-6.5.4/logstash-core/lib/logstash/pipeline.rb:286:in `block in start_workers'"]}
{
          "host" => "ubuntu",
       "message" => "<?xml version=\"\"1.0\"\" encoding=\"\"utf-8\"\"?><UpdateEvents><MachineInfo><AgentGUID>{00000000-0000-0000-0000-000000000000}</AgentGUID><MachineName>XXXXXXXX</MachineName><RawMACAddress>XXXXXXXX</RawMACAddress><IPAddress>XXXXXXXXX</IPAddress><AgentVersion>XXXXXXXX</AgentVersion><OSName>XXXXXXXX</OSName><TimeZoneBias>XXXXXXXXX</TimeZoneBias><UserName>XXXXXXXXX</UserName></MachineInfo><McAfeeCommonUpdater ProductName=\"\"McAfee Agent\"\" ProductVersion=\"\"5.0.0\"\" ProductFamily=\"\"TVD\"\"><UpdateEvent><EventID>XXXXXXXXX</EventID><Severity>XXXXXXXXX</Severity><GMTTime>XXXXXXXXX</GMTTime><ProductID>XXXXXXXXX</ProductID><Locale>XXXXXXXXX</Locale><Error>XXXXXXXXX</Error><Type>XXXXXXXXX</Type><Version>XXXXXXXXX</Version><InitiatorID>XXXXXXXXX</InitiatorID><InitiatorType>XXXXXXXXX</InitiatorType><SiteName>eXXXXXXXXX</SiteName></UpdateEvent></McAfeeCommonUpdater></UpdateEvents>",
          "type" => "mcafee-epo",
      "@version" => "1",
          "path" => "/home/xxx/Desktop/logstash-6.5.4/raw/epo_syslog.txt",
          "tags" => [
        [0] "_xmlparsefailure"
    ],
    "@timestamp" => 2020-08-16T14:27:35.031Z
}
^[[A^[[A^[[A^C[2020-08-16T21:28:09,026][WARN ][logstash.runner          ] SIGINT received. Shutting down.
[2020-08-16T21:28:09,139][INFO ][filewatch.observingtail  ] QUIT - closing all files and shutting down.
[2020-08-16T21:28:09,253][INFO ][logstash.pipeline        ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x74dd2afc run>"}

Raw log

<?xml version=""1.0"" encoding=""utf-8""?><UpdateEvents><MachineInfo><AgentGUID>{00000000-0000-0000-0000-000000000000}</AgentGUID><MachineName>XXXXXXXX</MachineName><RawMACAddress>XXXXXXXX</RawMACAddress><IPAddress>XXXXXXXXX</IPAddress><AgentVersion>XXXXXXXX</AgentVersion><OSName>XXXXXXXX</OSName><TimeZoneBias>XXXXXXXXX</TimeZoneBias><UserName>XXXXXXXXX</UserName></MachineInfo><McAfeeCommonUpdater ProductName=""McAfee Agent"" ProductVersion=""5.0.0"" ProductFamily=""TVD""><UpdateEvent><EventID>XXXXXXXXX</EventID><Severity>XXXXXXXXX</Severity><GMTTime>XXXXXXXXX</GMTTime><ProductID>XXXXXXXXX</ProductID><Locale>XXXXXXXXX</Locale><Error>XXXXXXXXX</Error><Type>XXXXXXXXX</Type><Version>XXXXXXXXX</Version><InitiatorID>XXXXXXXXX</InitiatorID><InitiatorType>XXXXXXXXX</InitiatorType><SiteName>eXXXXXXXXX</SiteName></UpdateEvent></McAfeeCommonUpdater></UpdateEvents>

I think it is objecting to the duplicate double quotes. Does

mutate { gsub => [ "message", '""', '"' ] }

help?

Oh it works like a charm. By the way, @Badger, I was trying to parse EPO syslog, and I saw Splunk has an excellent add-on to handle the logs. In the transforms.conf file of the Splunk add-on, they got this thing

[mcafee_epo_regex_field_extraction]
REGEX = <([\w-]+)>([^<]+?)<\/\1>
FORMAT = $1::$2
CLEAN_KEYS = false

It's used for extracting every single fields in XML, no matter how complex XML structure is, which is really cool and the thing I need. I wonder if I could do the same thing with Logstash?

addon: https://splunkbase.splunk.com/app/5085/

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.