Xpath fails to extract

Hi, i need extract one value from a parsed XML, my logstash configuration is

xml {
    source => "contentRequest"
    target => "contentRequest_field"
    store_xml => false
    xpath => [ "/datianagrafici/datipersonali/codicefiscale", "codicefiscale" ]
  }

If i set store_xml => true i see this extracted filelds

2022-12-05_16-12

I need extract the value of codicefiscale, based on this view i set xpath configuration in this way

xpath => [ "/datianagrafici/datipersonali/codicefiscale", "codicefiscale" ]

I also tried to set

xpath => [ "//datianagrafici/datipersonali/codicefiscale", "codicefiscale" ]

but I've same result. I'm expectd a new field into elasticsearch called codicefiscale, but I don't see it, is correct?

Thanks

What does the source XML look like?

Unfortunately the XML is incorrectly formatted, but logstash xml plugin extract correctly all fields, I don't know if this is a problem for xpath, contentRequest is exactly:

{ "xml": "<lavoratore><datiinvio><dataultimoagg>2019-05-22</dataultimoagg><codiceentetit>xxxxxxx</codiceentetit><tipovariazione>01</tipovariazione><datadinascita>1985-08-23</datadinascita></datiinvio><datianagrafici><datipersonali><codicefiscale>xxxxxxxxxx</codicefiscale><cognome>xxxxxx</cognome><nome>xxxxxxxxx</nome><sesso>x</sesso><datanascita>1985-08-23</datanascita><codcomune>xxxx</codcomune><codcittadinanza>xxx</codcittadinanza></datipersonali><residenza><codcomune>xxxx</codcomune><cap>xxxx</cap><indirizzo>xxxxxxx</indirizzo><localita /></residenza><domicilio><codcomune>xxx</codcomune><cap>xxxxx</cap><indirizzo>xxxxxxxx</indirizzo></domicilio><recapiti><telefono>0000000000</telefono><cellulare>3xxxxxxx</cellulare></recapiti></datianagrafici></lavoratore>"}

Indented version:

<lavoratore>
   <datiinvio>
      <dataultimoagg>xxxx-xx-xx</dataultimoagg>
      <codiceentetit>xxxxxxx</codiceentetit>
      <tipovariazione>xx</tipovariazione>
      <datadinascita>xxxx-xx-xx</datadinascita>
   </datiinvio>
   <datianagrafici>
      <datipersonali>
         <codicefiscale>xxxxxxxxxx</codicefiscale>
         <cognome>xxxxxx</cognome>
         <nome>xxxxxxxxx</nome>
         <sesso>x</sesso>
         <datanascita>xxxx-xx-xx</datanascita>
         <codcomune>xxxx</codcomune>
         <codcittadinanza>xxx</codcittadinanza>
      </datipersonali>
      <residenza>
         <codcomune>xxxx</codcomune>
         <cap>xxxx</cap>
         <indirizzo>xxxxxxx</indirizzo>
         <localita />
      </residenza>
      <domicilio>
         <codcomune>xxx</codcomune>
         <cap>xxxxx</cap>
         <indirizzo>xxxxxxxx</indirizzo>
      </domicilio>
      <recapiti>
         <telefono>0000000000</telefono>
         <cellulare>xxxxxxx</cellulare>
      </recapiti>
   </datianagrafici>
</lavoratore>

Thanks

The parsing used for store_xml is fundamentally different to the parsing used for xpath. The former uses the xml-simple library, the latter uses Nokogiri.

Nokogiri only works on correctly formatted XML. XmlSimple tolerates all kinds of leading or trailing junk around the XML.

Try

    json { source => "contentRequest" target => "[@metadata][contentRequest]" }
    xml {
        source => "[@metadata][contentRequest][xml]"
        store_xml => false
        xpath => { "//datianagrafici/datipersonali/codicefiscale/text()" => "codicefiscale" }
    }

You saved me! So it's like I thought, xpath thinks differently

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.