Store XML to Index in ES using logstash

This would be my first code in logstash
My config file is shared below, im using ver. 5.6.4

input {
file {
path => ""sample.xml""
start_position => ""beginning""
sincedb_path => ""/dev/null""
}
}

filter
{
xml {
source => "message"
}

output {
elasticsearch {
index => ""test""
document_type => ""test_info""
hosts => ""localhost:9200""
}

My Sample XML file :
SampleXML

I want to create an index with column names as ACCOUNT_NUM, CUSTOMER_REF and SOURCE_SYS and store values for each ELEMs,

Sample Table view of ES:
TEST_NUM | REF | SYS
ABC123234 | GHI123 | HOME
TST000116 | ABC123 | HOME

Apologies, I was a Oracle DB guy until now, just recently learning ELK stack. Help me fellas :slight_smile:
(EDITED the XML)

Your sample XML is not valid XML. Can you get a valid XML input?

XML and Logstash aren't particularly well-matched, so if you have other options for the file format (such as ndjson), you might have better luck. The primary reason is that Logstash is an engine for processing streams of data (e.g., data being appended to files), while XML by definition cannot be appended to because a legal XML file already contains its closing element.

The sample XML you have pasted is also not valid XML:

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/124158-xml }
╰─○ xmllint example-input.xml
example-input.xml:2: parser error : error parsing attribute name
<ELEM=0>
     ^
example-input.xml:2: parser error : attributes construct error
<ELEM=0>
     ^
example-input.xml:2: parser error : Couldn't find end of Start Tag ELEM line 2
<ELEM=0>
     ^
example-input.xml:7: parser error : error parsing attribute name
<ELEM=1>
     ^
example-input.xml:7: parser error : attributes construct error
<ELEM=1>
     ^
example-input.xml:7: parser error : Couldn't find end of Start Tag ELEM line 7
<ELEM=1>
     ^
example-input.xml:12: parser error : error parsing attribute name
<ELEM=2>
     ^
example-input.xml:12: parser error : attributes construct error
<ELEM=2>
     ^
example-input.xml:12: parser error : Couldn't find end of Start Tag ELEM line 12
<ELEM=2>
     ^
[error: 1]

That said, if you had valid XML, the pipeline would likely have a shape something like the following:

input {
  # ...
}
filter {
  # replaces value at `message` with the data structure it represents
  xml {
    source => "message"
    target => "message"
  }
  # emits one event per element in data structure; operates on `message` field by default
  split {  }
}
filter {
  # any additional filters to enrich/transform the individual elements
}
output {
  elasticsearch {
    # ...
  }
}

Corrected the XML file now, please have a look @Badger

Hi @yaauie,

I've just corrected the Sample XML, please have a look.
Thanks in advance :slight_smile:

Can you provide your xml as text, preferably bound by a markdown code block? I can't copy/paste anything useful from an image :slight_smile:

It is still not valid. <ELEM=0> is missing an attribute name, and it has a numeric attribute, both of which will trip up the XML parser. <ELEM x="0"> is what it wants.

Edited to use &gt; for the example of what it wants. The HTML parser didn't mangle the first one because it is not HTML :smiley:

I dont know, when I copy my XML here in the editor, it removes some tags automatically :frowning:
So, im uploading my xml on dropbox n sharing its link below

Sample XML File

And my config file looks like below and still throwing multiple errors

input {
file {
path => "/home/sample.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}

filter {
xml {
source => "message"
store_xml => true
target => "xmldata"
}
}

output {
elasticsearch {
index => "test"
document_type => "test_info"
hosts => "localhost:9200"
}

}

Apologies @Badger, please find my xml below.

Sample XML file

The linked XML is still invalid; please use xmllint or an online XML linter to get your XML into a valid state before proceeding:

<TEST_INTF>
 <ELEM=0>
        <TEST_NUM>ABC123234</TEST_NUM>
        <REF>GHI123</REF>
        <SYS>HOME</SYS>
 </ELEM>
 <ELEM=1>
        <TEST_NUM>ST000079</TEST_NUM>
        <REF>DEF123</REF>
        <SYS>HOME</SYS>
 </ELEM>
 <ELEM=2>
        <TEST_NUM>TST000116</TEST_NUM>
        <REF>ABC123</REF>
        <SYS>HOME</SYS>
 </ELEM>
</TEST_INTF>

-- sample.xml

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/124158-xml }
╰─○ xmllint sample.xml
sample.xml:2: parser error : error parsing attribute name
 <ELEM=0>
      ^
sample.xml:2: parser error : attributes construct error
 <ELEM=0>
      ^
sample.xml:2: parser error : Couldn't find end of Start Tag ELEM line 2
 <ELEM=0>
      ^
sample.xml:6: parser error : Opening and ending tag mismatch: TEST_INTF line 1 and ELEM
 </ELEM>
        ^
sample.xml:7: parser error : Extra content at the end of the document
 <ELEM=1>
 ^
[error: 1]

If you don't have an explicit reason to use XML, I would seriously suggest avoiding using it; as stated before in this thread, while it is technically possible to parse XML when we need to, the format is not well matched to how Logstash works.

OK, there are two things to consider. The first is how to ingest the file and get one event for each outer XML element. There are a few different use cases here. If you have something like a J9 JVM garbage collection log where the JVM is forever appending XML to it, a logstash file input is an excellent fit. However, if you have one file that contains XML and it will not change and you want to ingest it then I think a file input is a poor fit (not least because logstash does not exit when it gets to EOF, it waits and tails the file), and it is much easier to use a stdin input.

If you were going to use a file input it would be something like this. You have to use auto_flush_interval because there is no second event to trigger emission of the first. I regard this as an ugly hack.

input {
  file {
    path =>  "/some/absolute/path/test.xml"
    sincedb_path => "/dev/null"
    start_position => "beginning"
    codec => multiline {
      what => "previous"
      pattern => "^" # Every line has a beginning
      auto_flush_interval => 2
    }
  }

With a stdin input I would do this:

(cat file.xml; echo "Monsieur Spalanzani n'aime pas la musique") | ./logstash -f ...
input{
  stdin {
    codec => multiline {
      pattern => "^Monsieur Spalanzani n'aime pas la musique"
      negate => "true"
      what => "previous"
    }
  }
}

Next up, parse the XML and split the ELEM array up.

  mutate { gsub => [ "message", "ELEM=([0-9]+)", 'ELEM SOMENAME="\1"' ] }
  xml { source => "message" target => "theXML" force_array => false }

That gives you events with this structure, and reformatting is left as an exercise for the reader.

        "theXML" => {
        "ELEM" => {
            "SOMENAME" => "2",
            "TEST_NUM" => "TST000116",
                 "REF" => "ABC123",
                 "SYS" => "HOME"
        }
    }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.