How to parse and index xml in ES7

I need to index and parse xml in elasticsearsh.
I need one field to store the xml as raw data and another field that will contain a fvalue retrieved from xpath in the same xml

You can use an xml filter to parse the XML string. That supports xpath references.

I'm aware that such filter exists. Yet, the documentation is brief i couldn't build a working example from it . Can you show me how i can achieve it and how dummy config file would look like ?
Thanks :slight_smile:

Can you share a sample XML and field and what you are trying to parse?

it looks something like this :

<dataDocument xmlns="http://www.fpml.org/FpML-5/confirmation" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" fpmlVersion="5-9" xsi:schemaLocation="http://www.fpml.org/FpML-5/confirmation ../../fpml-main-5-9.xsd http://www.w3.org/2000/09/xmldsig# ../../xmldsig-core-schema.xsd">
<trade>
<tradeHeader>
<partyTradeIdentifier>
<partyReference href="PartyA"/>
<tradeId tradeIdScheme="http://www.PartyA.com/com-trade-id">ACAVS1234567</tradeId>
</partyTradeIdentifier>
<tradeDate id="TradeDate">2014-04-08</tradeDate>
</tradeHeader>
<commodityPerformanceSwap>
<primaryAssetClass>Commodity</primaryAssetClass>
<productType>Commodity:Energy:Index:Swap:Cash</productType>
<effectiveDate>
<adjustableDate>
<unadjustedDate>2014-04-01</unadjustedDate>
<dateAdjustments>
<businessDayConvention>NONE</businessDayConvention>
</dateAdjustments>
</adjustableDate>
</effectiveDate>
<terminationDate id="TerminationDate">
<adjustableDate>
<unadjustedDate>2014-10-31</unadjustedDate>
<dateAdjustments>
<businessDayConvention>NONE</businessDayConvention>
</dateAdjustments>
</adjustableDate>
</terminationDate>
<commodityVarianceLeg>
<payerPartyReference href="PartyA"/>
<receiverPartyReference href="PartyB"/>
<calculationPeriodsSchedule id="varianceLegCalculationPeriodSchedule">
<periodMultiplier>1</periodMultiplier>
<period>T</period>
<balanceOfFirstPeriod>false</balanceOfFirstPeriod>
</calculationPeriodsSchedule>
<paymentDates>
<adjustableDates>
<unadjustedDate>2014-11-05</unadjustedDate>
<dateAdjustments>
<businessDayConvention>FOLLOWING</businessDayConvention>
<businessCenters>
<businessCenter>USNY</businessCenter>
</businessCenters>
</dateAdjustments>
</adjustableDates>
</paymentDates>
<commodity>
<!--  Note:   instrumentId is required only in Confirmation View   -->
<instrumentId instrumentIdScheme="http://www.fpml.org/coding-scheme/commodity-reference-price-3-0">MOP-CFR BRAZIL-FMB</instrumentId>
<specifiedPrice>Spot</specifiedPrice>
</commodity>
<notionalAmount id="varianceLegNotionalAmount">
<!--  Note that, because this swap does not have a reinvestment feature on the notional  -->
<!--  that the 'amount' below is the notional amount for the Term                        -->
<currency>USD</currency>
<amount>1000000.00</amount>
<reinvestmentFeature>false</reinvestmentFeature>
</notionalAmount>
<varianceStrikePrice>0.09000</varianceStrikePrice>
<varianceCalculation>
<valuationDates>
<calculationPeriodsScheduleReference href="varianceLegCalculationPeriodSchedule"/>
<dayType>CommodityBusiness</dayType>
<dayDistribution>All</dayDistribution>
</valuationDates>
<annualizationFactor>252</annualizationFactor>
<nAdjustment>false</nAdjustment>
</varianceCalculation>
</commodityVarianceLeg>
<marketDisruption>
<marketDisruptionEvent>AsSpecifiedInMasterAgreement</marketDisruptionEvent>
<disruptionFallbacks>AsSpecifiedInMasterAgreement</disruptionFallbacks>
</marketDisruption>
</commodityPerformanceSwap>
<documentation>
<masterAgreement>
<masterAgreementType>ISDA</masterAgreementType>
<masterAgreementVersion>2002</masterAgreementVersion>
<masterAgreementDate>2010-03-23</masterAgreementDate>
</masterAgreement>
<contractualDefinitions>ISDA2006</contractualDefinitions>
<contractualDefinitions>ISDA2005Commodity</contractualDefinitions>
</documentation>
</trade>
<party id="PartyA">
<partyId partyIdScheme="http://www.fpml.org/coding-scheme/external/iso17442">Party_A_LEI</partyId>
<country>US</country>
<organizationType>SwapDealer</organizationType>
</party>
<party id="PartyB">
<partyId partyIdScheme="http://www.fpml.org/coding-scheme/external/iso17442">Party_B_LEI</partyId>
<organizationType>Non-SD_Non-MSP</organizationType>
</party>
</dataDocument>

it follows fpml standard for xml .
what i'm trying to achieve is for each xml file i need one document in ES that will contain the entire file as a string in one field and its id will be one field that is inside the xml :

perhaps like this:

{
id: " some field from the xml for the sake of the example let's say it will be tradeId",
xml:" will contain the entire xml as a string"

}

There is an example of an xml filter using xpath here.

i have previously worked with similar instructions , the thread you're referencing contains an error that i also encountered as well ( "Last 80 unconsumed characters:" )

That means your "XML" is not valid XML. There should be an exception logged telling you why. For example, in that thread it is ":exception=>#<REXML::ParseException: No close tag for /NewData".

If you XML is spread across multiple lines you will need to combine them into a single event. If you are using a file input a multiline codec may do the job. What does your input configuration look like?

like this:

input {
  file {
    path => "C:/Users/user/Documents/shareCommon/pocc.xml"
    start_position => beginning
    sincedb_path => "nul"
    type => "xml"
    codec => multiline {

        pattern => "^<?FpML.*>"
        charset => "ISO-8859-1"
        negate => "true"
        what => "previous"
        max_lines => 100000
        
      }

  }
}
filter {
    xml {
        source => "message"
        target => "doc"
        store_xml => true
        xpath => ["/FpML/party/partyId/text()","fpml"]
        
    }

}


output {

  stdout {
    codec => rubydebug
  }
    
}  

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.