Hi Guys!
I need index file XML on cluster Elasticsearch, following a flux:
S3 -> Logstash -> Elasticsearch
I read about xpath on xml filter, BUT my xml is very large then I don't get map all xml
I need that each field on XML be a field on Elasticsearch, like this:
This is a very simple example a XML file:
<?xml version="1.0" encoding="ISO-8859-1"?><FAT><DATA><CLIENT Name="bla bla bla" A_C="bla01" Id="001" CP="00981726"></CLIENT></DATA></FAT></xml>
XML filter is a better option to do this ?
Badger
August 1, 2018, 3:14pm
2
That is not valid XML (it opens <CLIENT> and closes </CLIENTE>). Also, you need to strip off the </xml> which can be done using
mutate { gsub => ["message", "</xml>$", ""] }
Then you can parse it using
xml { source => "message" store_xml => true target => "theXML" force_array => false }
which gets you
"theXML" => {
"DATA" => {
"CLIENT" => {
"A_C" => "bla01",
"CP" => "00981726",
"Name" => "bla bla bla",
"Id" => "001"
}
}
}
Hi @Badger
I configured mutate and xml, but I'm receveid this error:
:exception=>#<REXML::ParseException: missing attribute quote Line: 1 Position: 62576 Last 80 unconsumed characters:
My config:
filter { mutate { gsub => ["message", "</xml>$", ""] } }
filter { xml { source => "message" store_xml => true target => "theXML" force_array => false } }
The index is created on Elasticsearch, but all fields without field message
Like this:
"_index" : "teste-2018.08", "_type" : "doc", "_id" : "_1FW9mQBCtVZHh-PRMtz", "_score" : 1.0, "_source" : { "tags" : [ "_xmlparsefailure" ], "@timestamp" : "2018-08-01T16:33:22.758Z", "@version" : "1", "message" : "<?xml version=\\\"1.0\\\" encoding=\\\"ISO-8859-1\\\"?><FAT ...
Badger
August 1, 2018, 4:42pm
4
Immediately after that error message it will show the XML that has an issue. I suspect your XML looks like this
<foo><bar a=1/></foo>
That is not valid XML. It has to be
<foo><bar a="1"/></foo>
You might be able to fix the "XML" using stuff like
mutate { gsub => [ "message", "( a=)([^/> ]+)([/> ])", '\1"\2"\3' ] }
@Badger
Yes, my XML is valid, look:
<?xml version=\\\"1.0\\\" encoding=\\\"ISO-8859-1\\\"?><FAT><DATA><CLIENT Nome=\\\"bla bla\\\" A_C=\\\"bla01 - .\\\" Id=\\\"0010\\\" CP=\\\"00098281\\\"></CLIENT></DATA></FAT></xml>
This is only a part of XML, there is much that 2.000 lines
Badger
August 1, 2018, 4:57pm
6
As I said, immediately after the error message is the problematic XML.
[2018-08-01T12:55:15,376][WARN ][logstash.filters.xml ] Error parsing xml with XmlSimple {:source=>"message", :value=>"<foo><bar a=1></foo>", :exception=>#<REXML::ParseException: missing attribute quote
Line: 1
Position: 20
Last 80 unconsumed characters:
<bar a=1></foo>>,
Are you able to post the full error message including the unconsumed characters?
@Badger Sure!
:exception=>#<REXML::ParseException: missing attribute quote Line: 1 Position: 102125 Last 80 unconsumed characters: <CLIENT Nome=\"bla bla \" A_C=\"bla01 - .\" Id>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:374:in
pull_event'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:185:in pull'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:23:in
parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in
initialize'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in parse'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in
xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.5/lib/logstash/filters/xml.rb:182:in
filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:145:in do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:164:in
block in multi_filter'", "org/jruby/RubyArray.java:1734:in each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:161:in
multi_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb:47:in multi_filter'", "(eval):69:in
block in filter_func'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:445:in filter_batch'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:424:in
worker_loop'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:386:in block in start_workers'"]}
Badger
August 1, 2018, 5:09pm
8
The log message adds a > to the XML. So the end of the XML it is consuming is
CLIENT Nome=\"bla bla \" A_C=\"bla01 - .\" Id
I think the XML might be truncated. What input are you using?
@Badger
My input is very simple:
input { s3 { "bucket" => "fat" "prefix" => "XML/XX/2018/07/13/1/00" } }
Each file xml have at about 350 KB
Badger
August 1, 2018, 5:50pm
10
I cannot reconcile that error message with the source code unless the input event literally ended at Id.
system
(system)
Closed
August 29, 2018, 5:51pm
11
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.