Store XML to Index in ES using logstash

Galileo · March 15, 2018, 6:06pm

This would be my first code in logstash
My config file is shared below, im using ver. 5.6.4

input {
file {
path => ""sample.xml""
start_position => ""beginning""
sincedb_path => ""/dev/null""
}
}

filter
{
xml {
source => "message"
}

output {
elasticsearch {
index => ""test""
document_type => ""test_info""
hosts => ""localhost:9200""
}

My Sample XML file :
SampleXML

I want to create an index with column names as ACCOUNT_NUM, CUSTOMER_REF and SOURCE_SYS and store values for each ELEMs,

Sample Table view of ES:
TEST_NUM | REF | SYS
ABC123234 | GHI123 | HOME
TST000116 | ABC123 | HOME

Apologies, I was a Oracle DB guy until now, just recently learning ELK stack. Help me fellas
(EDITED the XML)

Badger · March 15, 2018, 6:53pm

Your sample XML is not valid XML. Can you get a valid XML input?

yaauie · March 15, 2018, 7:13pm

XML and Logstash aren't particularly well-matched, so if you have other options for the file format (such as ndjson), you might have better luck. The primary reason is that Logstash is an engine for processing streams of data (e.g., data being appended to files), while XML by definition cannot be appended to because a legal XML file already contains its closing element.

The sample XML you have pasted is also not valid XML:

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/124158-xml }
╰─○ xmllint example-input.xml
example-input.xml:2: parser error : error parsing attribute name
<ELEM=0>
     ^
example-input.xml:2: parser error : attributes construct error
<ELEM=0>
     ^
example-input.xml:2: parser error : Couldn't find end of Start Tag ELEM line 2
<ELEM=0>
     ^
example-input.xml:7: parser error : error parsing attribute name
<ELEM=1>
     ^
example-input.xml:7: parser error : attributes construct error
<ELEM=1>
     ^
example-input.xml:7: parser error : Couldn't find end of Start Tag ELEM line 7
<ELEM=1>
     ^
example-input.xml:12: parser error : error parsing attribute name
<ELEM=2>
     ^
example-input.xml:12: parser error : attributes construct error
<ELEM=2>
     ^
example-input.xml:12: parser error : Couldn't find end of Start Tag ELEM line 12
<ELEM=2>
     ^
[error: 1]

That said, if you had valid XML, the pipeline would likely have a shape something like the following:

input {
  # ...
}
filter {
  # replaces value at `message` with the data structure it represents
  xml {
    source => "message"
    target => "message"
  }
  # emits one event per element in data structure; operates on `message` field by default
  split {  }
}
filter {
  # any additional filters to enrich/transform the individual elements
}
output {
  elasticsearch {
    # ...
  }
}

Galileo · March 16, 2018, 5:08am

Corrected the XML file now, please have a look @Badger

Galileo · March 16, 2018, 5:15am

Hi @yaauie,

I've just corrected the Sample XML, please have a look.
Thanks in advance

yaauie · March 16, 2018, 12:09pm

Can you provide your xml as text, preferably bound by a markdown code block? I can't copy/paste anything useful from an image

Badger · March 16, 2018, 1:43pm

It is still not valid. <ELEM=0> is missing an attribute name, and it has a numeric attribute, both of which will trip up the XML parser. <ELEM x="0"> is what it wants.

Edited to use > for the example of what it wants. The HTML parser didn't mangle the first one because it is not HTML

Galileo · March 16, 2018, 5:21pm

I dont know, when I copy my XML here in the editor, it removes some tags automatically
So, im uploading my xml on dropbox n sharing its link below

Sample XML File

And my config file looks like below and still throwing multiple errors

input {
file {
path => "/home/sample.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}

filter {
xml {
source => "message"
store_xml => true
target => "xmldata"
}
}

output {
elasticsearch {
index => "test"
document_type => "test_info"
hosts => "localhost:9200"
}

}

Galileo · March 16, 2018, 5:31pm

Apologies @Badger, please find my xml below.

Sample XML file

yaauie · March 16, 2018, 5:39pm

The linked XML is still invalid; please use xmllint or an online XML linter to get your XML into a valid state before proceeding:

<TEST_INTF>
 <ELEM=0>
        <TEST_NUM>ABC123234</TEST_NUM>
        <REF>GHI123</REF>
        <SYS>HOME</SYS>
 </ELEM>
 <ELEM=1>
        <TEST_NUM>ST000079</TEST_NUM>
        <REF>DEF123</REF>
        <SYS>HOME</SYS>
 </ELEM>
 <ELEM=2>
        <TEST_NUM>TST000116</TEST_NUM>
        <REF>ABC123</REF>
        <SYS>HOME</SYS>
 </ELEM>
</TEST_INTF>

-- sample.xml

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/124158-xml }
╰─○ xmllint sample.xml
sample.xml:2: parser error : error parsing attribute name
 <ELEM=0>
      ^
sample.xml:2: parser error : attributes construct error
 <ELEM=0>
      ^
sample.xml:2: parser error : Couldn't find end of Start Tag ELEM line 2
 <ELEM=0>
      ^
sample.xml:6: parser error : Opening and ending tag mismatch: TEST_INTF line 1 and ELEM
 </ELEM>
        ^
sample.xml:7: parser error : Extra content at the end of the document
 <ELEM=1>
 ^
[error: 1]

If you don't have an explicit reason to use XML, I would seriously suggest avoiding using it; as stated before in this thread, while it is technically possible to parse XML when we need to, the format is not well matched to how Logstash works.

Badger · March 16, 2018, 6:31pm

OK, there are two things to consider. The first is how to ingest the file and get one event for each outer XML element. There are a few different use cases here. If you have something like a J9 JVM garbage collection log where the JVM is forever appending XML to it, a logstash file input is an excellent fit. However, if you have one file that contains XML and it will not change and you want to ingest it then I think a file input is a poor fit (not least because logstash does not exit when it gets to EOF, it waits and tails the file), and it is much easier to use a stdin input.

If you were going to use a file input it would be something like this. You have to use auto_flush_interval because there is no second event to trigger emission of the first. I regard this as an ugly hack.

input {
  file {
    path =>  "/some/absolute/path/test.xml"
    sincedb_path => "/dev/null"
    start_position => "beginning"
    codec => multiline {
      what => "previous"
      pattern => "^" # Every line has a beginning
      auto_flush_interval => 2
    }
  }

With a stdin input I would do this:

(cat file.xml; echo "Monsieur Spalanzani n'aime pas la musique") | ./logstash -f ...
input{
  stdin {
    codec => multiline {
      pattern => "^Monsieur Spalanzani n'aime pas la musique"
      negate => "true"
      what => "previous"
    }
  }
}

Next up, parse the XML and split the ELEM array up.

  mutate { gsub => [ "message", "ELEM=([0-9]+)", 'ELEM SOMENAME="\1"' ] }
  xml { source => "message" target => "theXML" force_array => false }

That gives you events with this structure, and reformatting is left as an exercise for the reader.

        "theXML" => {
        "ELEM" => {
            "SOMENAME" => "2",
            "TEST_NUM" => "TST000116",
                 "REF" => "ABC123",
                 "SYS" => "HOME"
        }
    }

system · April 13, 2018, 6:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Struggling to parse XML using Logstash Logstash	1	279	October 13, 2020
Indexing data in elasticsearch through logstash using xml files Logstash	5	337	December 25, 2018
Help with logstash and XML Logstash	5	1159	July 6, 2017
Storing logs containing several different XML types/schemas into ES Elasticsearch	1	355	December 11, 2019
In what format shoould i index xml data in elasticsearch Logstash	5	2730	April 7, 2017

Store XML to Index in ES using logstash

Related topics