XML Dynamic Parsing

AloysiusParedes · April 5, 2018, 3:05pm

Still new to ELK. I'd like to parse an XML with Logstash and output to Elasticsearch.

XML:

<mydata>
    <data1>O</data1>
    <data2>false</data2>
            .
            .
            .
    <data3 REPEATINGTYPE="PageGroup">
        <rowdata REPEATINGINDEX="subdata1">
            <datax1>mycontent1</datax1>
            <datax2>mycontent2</datax2>
            .
            .
            .
        </rowdata>
        <rowdata REPEATINGINDEX="subdata2">
            <datax1>mycontent1</datax1>
            <datax2>mycontent2</datax2>
            .
            .
            .
        </rowdata>
    .
    .
    .
</mydata>

My Config File:

input{
    file{
        path => "/usr/share/logstash/bin/myXML.xml"
        start_position => beginning
    }
}
#filter{
#    I DON'T KNOW WHAT TO PUT HERE
#}
output{
    elasticsearch{
        hosts => ["localhost:9200"]
        user => elastic
        password => changeme
    }
    stdout{}
}

I am confused as to how I should make my config file with the filters and what not. The XML file will have nearly hundreds of fields (hence the ". . . " and some have sub-fields (sort of a Object Oriented way of encapsulating the data within other data like a Class in Java). Is there a way to dynamically parse the XML file so I don't have to manually define the fields and the contents of them?

Also, am I outputting to Elasticsearch correctly?

Any help would be greatly appreciated

Suman_Reddy1 · April 5, 2018, 5:33pm

Use ruby code plugin with Nokogiri ruby extension.

AloysiusParedes · April 5, 2018, 6:04pm

@Suman_Reddy1 is there any documentation and examples of ruby and Nokogiri anywhere?

I found this: https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html

Badger · April 5, 2018, 6:23pm

You should use a multiline codec on the input to consume the entire file as a single event. There are many threads about how to do that. Then you can use a logstash xml filter to parse the XML

filter {
  xml { source => "message" target => "theXML" }
}

The output looks like this

        "theXML" => {
        "data3" => [
            [0] {
                "REPEATINGTYPE" => "PageGroup",
                      "rowdata" => [
                    [0] {
                                "datax1" => [
                            [0] "mycontent1"
                        ],
                                "datax2" => [
                            [0] "mycontent2"
                        ],
                        "REPEATINGINDEX" => "subdata1"
                    },
                    [1] {
                                "datax1" => [
                            [0] "mycontent1"
                        ],
                                "datax2" => [
                            [0] "mycontent2"
                        ],
                        "REPEATINGINDEX" => "subdata2"
                    }
                ]
            }
        ],
        "data1" => [
            [0] "O"
        ],
        "data2" => [
            [0] "false"
        ]
    }

or, if you set force_array => false

        "theXML" => {
        "data3" => {
            "REPEATINGTYPE" => "PageGroup",
                  "rowdata" => [
                [0] {
                            "datax2" => "mycontent2",
                    "REPEATINGINDEX" => "subdata1",
                            "datax1" => "mycontent1"
                },
                [1] {
                            "datax2" => "mycontent2",
                    "REPEATINGINDEX" => "subdata2",
                            "datax1" => "mycontent1"
                }
            ]
        },
        "data1" => "O",
        "data2" => "false"
    }

Suman_Reddy1 · April 5, 2018, 6:26pm

Ok. My bad, I didnt see your configuration.
With your configuration, logstash will read each line as a new event. To fix that use multiline codec. Multiline codec will aggregate multiple lines into a single log event, in this case it will create one xml file. There are plenty of sample around multiline please go through them. As I am on mobile device unable to give you exact config.
Secondly, once we aggregated and created a single xml file use ruby code.

ruby{
code => “
require nokogiri
xml=event.get(“message”)
<>
“
}

Above is a reference how to parse xml

AloysiusParedes · April 5, 2018, 7:20pm

@Badger does the Logstash XML filter work to dynamically parse out every tag in the XML? In other words, do I have to specify each field in the filter that exists in my XML?

AloysiusParedes · April 5, 2018, 7:21pm

@Suman_Reddy1 Thank you. I will take a look into the Ruby and Nokogiri. If you have any more info once you have time, I'd really appreciate that too. I'll try and keep learning.

Badger · April 5, 2018, 8:00pm

Yes. The filter I showed parsed out all the fields in the XML

Suman_Reddy1 · April 6, 2018, 8:24am

Below is a recursive way of iterating all elements in an xml

ruby {
		code => "
					require 'nokogiri'def iterative(ele)
        	ele.children.each do |tempNode|
        		if tempNode.text?
        			puts tempNode.content
        		else
        			iterative(tempNode)
        		end
        	end
        end
        xml_doc = Nokogiri::XML.parse(event.get('xml-data'))
        iterative(xml_doc)"
      }

Above is the sample, which we used to parse xml do some inline masking on the data. This should give you some insight on xml processing. If you dont have to do much manipulation on XML, I would suggest Badger solution rather than this.

system · May 4, 2018, 8:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parsing XML log file with logstash Logstash	7	8367	March 7, 2019
Logstash XML file not parsing Logstash	3	151	February 27, 2024
Work with xml in logstash Logstash	2	224	March 6, 2023
Struggling to parse XML using Logstash Logstash	1	292	October 13, 2020
Ingest xml file using Logstash Logstash	10	2181	December 10, 2020

XML Dynamic Parsing

Related topics