XML import


(Michael) #1

XML Import with Logstash

Hi there

I have read about all XML filter import examples I can find. However, I just cannot figure out how to get this specific xml format imported through logstash.

Use case

The government of Denmark provides a public XML feed with data from the motorregistry. I would like to import this data into elasticsearch.

I wish to store it, to play with the analysis and types of data..

Anyway, the format contains nodes with danish names from the source (sorry).

XML Format

The repeating element for each registration is the node: ns:Statistik

I would like to just import the entire xml tree within ns:Statistik into a document type of the name "vehicle". The nodes should be converted into fields and the content with in, the value / nested object.

The XML file format can be found here:

Issue

Hardly any of my logstash.conf is working. I have gotten two different results when importing. I have succesfully imported the entire xml file into 1 document. I have succesfully imported each line into seperate documents.

But I don't understand the documentation of the XML filter apparently. CSV import etc. I know, and it seems so simply in comparison. The XML is really difficult to understand, and I cannot get any of the examples I've found on the forum nor stackoverflow to work either.

Hopefully one of you guys could provide me with a complete logstash conf, and from that I hopefully could learn how to do this with the XML filter in the future.

Thanks for reading


#2

I'm on mobile, so I'll keep this short and untested. Your configuration will probably have to look similar to this:

filter {
  xml {
    target => "doc" 
    source => "message"
  }
  mutate { 
    remove_field => ["message"] 
  }
  split {
    field => "[doc][ns:StatistikSamling][ns:Statistik]"
  } 
}

The split filter will separate the documents like you want to do. And then you have to rename and delete fields until you have a nice structure.

(If you have a configuration that is not working and are looking for help, it's always good to post what you've got. Solving a problem is easier, if you have something to build on. And one is probably more motivated, if the post is not basically saying 'Please do my job for me' :slight_smile:)


(Michael) #3

Hi Jenni,

I will take your feedback into account. It was in no way intended for a "Please do my job for me", I wish to learn how to work with this xml import and I have attempted for weeks now, also reaching out on IRC.

Nevertheless I appreciate your feedback and I have had a go on it. With your example it is clear to me first of all, how I misunderstood the XML filter documentation entirely. I attempted to make multiple nodes with xpath.

I still does not work entirely, but I would like to fiddle a little with your example and the split function. I am having the error _xmlparsefailure and _split_type_failure, but I think I can solve these.

Thank you for your example, it was just what I needed to move on


#4

I hope, I didn't offend you with that last line. I didn't mean any harm :smile: Feel free to ask, if you get stuck again!


(Michael) #5

Absolutely not, just wanted to clarify :slight_smile:

Thank you! I will reach out if I get stuck. Kid is a sleep now, so I am giving it a go


(Michael) #6

Hi Jenni,

I think I've located my issue, but unable to figure out how exactly to solve it.

It seems my codec is not matching each xml "document" node correctly.

I've attempted with:
pattern => "^<ns:ESStatistikListeModtag_I>"
pattern => "^<ns:Statistisk>"
(also tested without ns)

With multiline I just cant make it import anything, if I remove it I get each line imported separately.

input {
file {
    path => "/usr/share/logstash/files/test.xml"
    start_position => "beginning"
	sincedb_path => "/dev/null"
	stat_interval => 1
    codec => multiline {
        pattern => "^<ns:ESStatistikListeModtag_I>"
        negate => "true"
        what => "previous"
    }
}

}


#7

Your ESStatistikListeModtag_I node has an attribute, so ^<ns:ESStatistikListeModtag_I> (the tag is closed directly after its name) will never match.


#8

You need to understand the Multiline codec plugin. Your mistake happens at the what option.


#9

hope this code helps you,


(Michael) #10

Yea you we're absolutely right. I've found the solution now.

Got it all working now. Thanks a ton for your time Jenni


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.