XML filter on Logstash

I'm indexing xml on Elasticsearch using xml filter, below a simple example (a original xml is very large):

<?xml version="1.0" encoding="ISO-8859-1"?><FATURA><DADOS_CADASTRAIS><CLIENTE Nome="john" A_C="tex" Id_Cliente="001" CPF_CNPJ="919191"></CLIENTE></DADOS_CADASTRAIS></FATURA?

My input/filter is logstash is:

input {
s3 {
"bucket" => "xx"
}
}

filter {
xml {
source => "message" store_xml => true target => "theXML" force_array => false
remove_field => ["message"] }
}

When I indexing xml on encoding ISO-8859-1, I receveid this error:

/NF></NOTAS_FISCAIS></FATURA>", :exception=>#<REXML::ParseException: missing attribute quote Line: 1 Position: 98173 Last 80 unconsumed characters: <CLIENTE Nome=\"john\" A_C=\"x\" Id_C>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:374:inpull_event'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:185:in pull'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:23:inparse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:ininitialize'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in parse'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:inxml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.5/lib/logstash/filters/xml.rb:182:infilter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:145:in do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:164:inblock in multi_filter'", "org/jruby/RubyArray.java:1734:in each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:161:inmulti_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb:47:in multi_filter'", "(eval):69:inblock in filter_func'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:445:in filter_batch'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:424:inworker_loop'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:386:in block in start_workers'"]}

BUT! When I change manualy a encondig to UTF-8 I don't received error and XML is indexed the way I need it.

I try add filter mutate to change, but, "encondig" is change, but I receveid same error, like this: :confused:
filter { mutate { gsub => ["message", "ISO-8859-1", "UTF-8"] } }

The mutate does come before the xml filter, right?

Yes @Badger

Like this:

  s3 {
    "bucket" => "x"
}
}

filter {
  mutate { gsub => ["message", "ISO-8859-1", "UTF-8"] }
}

filter {
  xml {
      source => "message" store_xml => true target => "theXML" force_array => false
        remove_field => ["message"] }
}

I see a change in a field, but I received error in Logstash

I solved this add codec on input :wink:

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.