I'm indexing xml on Elasticsearch using xml filter
, below a simple example (a original xml is very large):
<?xml version="1.0" encoding="ISO-8859-1"?><FATURA><DADOS_CADASTRAIS><CLIENTE Nome="john" A_C="tex" Id_Cliente="001" CPF_CNPJ="919191"></CLIENTE></DADOS_CADASTRAIS></FATURA?
My input/filter is logstash is:
input {
s3 {
"bucket" => "xx"
}
}
filter {
xml {
source => "message" store_xml => true target => "theXML" force_array => false
remove_field => ["message"] }
}
When I indexing xml on encoding ISO-8859-1, I receveid this error:
/NF></NOTAS_FISCAIS></FATURA>", :exception=>#<REXML::ParseException: missing attribute quote Line: 1 Position: 98173 Last 80 unconsumed characters: <CLIENTE Nome=\"john\" A_C=\"x\" Id_C>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:374:in
pull_event'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:185:in pull'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:23:in
parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in
initialize'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in parse'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in
xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.5/lib/logstash/filters/xml.rb:182:in
filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:145:in do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:164:in
block in multi_filter'", "org/jruby/RubyArray.java:1734:in each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:161:in
multi_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb:47:in multi_filter'", "(eval):69:in
block in filter_func'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:445:in filter_batch'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:424:in
worker_loop'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:386:in block in start_workers'"]}
BUT! When I change manualy a encondig to UTF-8 I don't received error and XML is indexed the way I need it.
I try add filter mutate to change, but, "encondig" is change, but I receveid same error, like this:
filter { mutate { gsub => ["message", "ISO-8859-1", "UTF-8"] } }