How i can create index on xml tags

Hello,

I am new to logstash, i have one requirement like below.

i want to create index on xml tag. this xml is present in database table. I am able to index on column which having this xml. but the requirement is to index on particular xml tags.

could you please help me with example

Xml from db table column as below

<?xml version="1.0" encoding="UTF-8"?>
<alert-header>
    <elem name="alertDate">2019-01-10 01:56:43</elem>
    <elem name="score">100</elem>
    <elem name="alertEntityKey">1539912_029_07/01/2018        </elem>
    <elem name="partyType">Entity</elem>
    <elem name="partyYOB"/>
    <elem name="partyBirthLocation"/>
    <elem name="ahData">
        <elem name="alertDate">2019-01-10 01:56:43</elem>
    </elem>
    <elem name="ahData">
        <elem name="jobID">01-10-2019</elem>
    </elem>
    <elem name="ahData">
        <elem name="jobName">TEST_PID</elem>
    </elem>
    <elem name="ahData">
        <elem name="jobType">LARGEBATCH</elem>
    </elem>
    <elem name="ahData">
        <elem name="score">100</elem>
    </elem>
    <elem name="ahData">
        <elem name="numberOfHits">7</elem>
    </elem>
    <elem name="ahData">
        <elem name="partyKey">1539912_029_07/01/2018</elem>
    </elem>
    <elem name="ahData">
        <elem name="partySourceId"/>
    </elem>
    <elem name="ahData">
        <elem name="partyName">ISIS IN THE ISLAMIC SAHEL</elem>
    </elem>
    <elem name="ahData">
        <elem name="partyLName">ISIS IN THE ISLAMIC SAHEL</elem>
    </elem>
    <elem name="ahData">
        <elem name="partyAliases"/>
    </elem>
    <elem name="ahData">
        <elem name="alertType">Sanctions</elem>
    </elem>
    <partyIds/>
    <elem name="partyNatCountries">
        <elem name="countryCd"/>
    </elem>
    <elem name="partyAddresses">
        <elem name="partyAddressLine1"/>
        <elem name="partyAddressLine2"/>
        <elem name="partyCity"/>
        <elem name="partyPostalCd"/>
        <elem name="partyStateProvince"/>
        <elem name="countryCd"/>
    </elem>
</alert-header>

i want to index on jobId, jobName etc...

You can parse the XML using

xml { source => "message" target => "[@metadata][XML]" store_xml => true }

The resulting XML will look like this

           "XML" => {
    "elem" => [
        [ 0] {
               "name" => "alertDate",
            "content" => "2019-01-10 01:56:43"
        },
        [ 1] {
               "name" => "score",
            "content" => "100"
        },
[...]
        [ 6] {
            "name" => "ahData",
            "elem" => [
                [0] {
                       "name" => "alertDate",
                    "content" => "2019-01-10 01:56:43"
                }
            ]
        },

You can use a ruby filter to iterate over the array, and if the array entry has name and content fields use them to add a field to the event, and if the array entry has a elem field do the same check on that. Something like this:

    ruby {
        code => '
            event.get("[@metadata][XML][elem]").each { |x|
                if x["name"] and x["content"]
                    event.set(x["name"], x["content"])
                else
                    if x["elem"].kind_of?(Array)
                        x["elem"].each { |y|
                            if y["name"] and y["content"]
                                event.set(y["name"], y["content"])
                            end
                        }
                    end
                end
            }
        '
    }

Then you may need special handling for some of the fields, but this should get you started.

Thanks for response.

My current logstash-config.conf is as below

input {
jdbc {
#input Configuration
jdbc_connection_string => "jdbc:oracle:thin:@oraasgtd37-scan.nam.nsroot.net:8889/SID"
jdbc_user => "admin"
jdbc_password => "*****"
jdbc_driver_library => "I:\Jars\ojdbc6.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
statement => "select html_file_key from alerts where deleted =0"
#use_column_value => true
#tracking_column => "alert_internal_id"
#schedule => "
* * * * *"

	}

}

output {
elasticsearch {
#output configuration
hosts => "http://localhost:9200"
index => "alert_index"
document_type => "alert"
#document_id => "%{alert_internal_id}"
}
stdout{
codec => rubydebug
}
}

what changes required in this to parse the xml.

Note: HTML_FILE_KEY returns the xml

Add

filter {
    xml { source => "html_file_key" target => "[@metadata][XML]" store_xml => true }
    ruby {
        code => '
            event.get("[@metadata][XML][elem]").each { |x|
                if x["name"] and x["content"]
                    event.set(x["name"], x["content"])
                else
                    if x["elem"].kind_of?(Array)
                        x["elem"].each { |y|
                            if y["name"] and y["content"]
                                event.set(y["name"], y["content"])
                            end
                        }
                    end
                end
            }
        '
    }
}

I added this filter as it is, but i am getting Error.

[ERROR][logstash.filters.ruby ] Ruby exception occurred: undefined method `each' for nil:NilClass

Please help me out on this, i am unaware of ruby.

I suggest changing the output to be

stdout { codec => rubydebug { metadata => true } }

and see of the XML was successfully parsed to include [@metadata][XML][elem]

Thank you so much!
I am able to parse xml now. But my requirement is to search with name, How i can right the uri to get alertDate or score from this parsed xml. could you please help me on this?

That is what the ruby filter does.

But how i get the content with respect to name in elastic search

Any update on this Sir

What exactly do you not like about the events created by the ruby filter. Please show an event and what you want to change in it.

Hi Badger, actually i am still struggling to add fields(name=>content) to elastic search. I am not aware of ruby.

Is there any other way using XPATH??

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.