Xml filter: create filter definition based on xsd

JPelastic · July 25, 2018, 1:23pm

Hi

Context: I need to put (several millions) XML invoices into ES and query and aggregate over their content.

Proposed solution: use Logstash xml filter plugin to map the XML to JSON

First question: For my context, would this be the best solution?
If so, second question: can I create the filter definition automatically from an XSD describing the invoices? Because the XSD contains about 500 possible fields and I will have to make about 10 different indices.

Badger · July 25, 2018, 3:08pm

Well if you need to use xpath, I suppose in theory you could use xslt to transform an xsd into a set of xpath expressions. But I would just use something like

xml {
    source => "message"
    store_xml => true
    target => "theXML"
    force_array => false
}

It will parse the XML and create a JSON structure that reflects it in a field called theXML

JPelastic · July 25, 2018, 3:10pm

Whaaaaaaaaaat? So simple?

That's awesome!

JPelastic · July 25, 2018, 4:34pm

So I started this up and added this config-file, without changing any other stuff in logstash configuration:

input{ 
  file
  {
	path => "C:\elastic\XML4Logstash"
	type => "xml"
  } 
}
filter{
  xml{
	source => "message"
    store_xml => false
    target => "orderrsp"
    force_array => false
  }
}
output {
  elasticsearch { hosts => ["localhost:9200"] }
}

I was hoping logstash would now pick up any file in that folder?
I've been looking at the documentation. It says what exists, but not how it works.

Badger · July 25, 2018, 5:41pm

You almost certainly want store_xml to be true.

The file input will read any files that match the path. So I would expect you to use

path => "C:\elastic\XML4Logstash*.xml"

If you run logstash once it will read the files and store the fact that it has done so in the sincedb. If you restart logstash it will know it has read the files and start tailing them at the point it had read to so that if anything is appended it can process it. Probably not useful in your case. However, it will read any new files.

JPelastic · July 26, 2018, 8:46am

I got it doing stuff, but still it's hard to understand what is going on.
Please correct me if I'm wrong:

input{ 
  file
  {
	path => "C:\elastic\XML4Logstash"
	type => "xml"
  } 
}
filter{
  xml{
	source => "Orderrsp"
	remove_namespaces => true
    store_xml => true
    target => "orderrsp"
    force_array => false
  }
}
output {
  elasticsearch { 
	hosts => ["localhost:9200"]
	index => "logstash" 
  }
}

Logstash reads xml from the inputfolder
Then filters it

it takes the source and puts it in the message field. That field is an internal logstash field?
it does some stuff to it an stores the result into the orderrsp field (again a logstash field).

next it will try to output it to ES in the logstash index (this is a very early proof of concept - no wildcards)

Badger · July 26, 2018, 12:35pm

If you want to consume an entire file as a single event then you will need to use a multiline filter. There is an example here. If you do not use a multiline filter then each line of the file will be a separate event.

The event will contain the contents of the file (either a single line, or the output of the multiline) in a field called "message", so that should be the source for you xml filter.

JPelastic · July 26, 2018, 1:24pm

Hi Badger

Thanks to your help I got it working all the way.

Only now I will need to define a mappings for my documents and that mapping is gonna be pretty large (like 100+ fields a piece).

Is there any way I can translate an XSD into an ES mapping on the fly?

Thanks

J.

Badger · July 26, 2018, 3:08pm

You might want to ask that in the elasticsearch forum.

system · August 23, 2018, 3:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need a complete XML Filter example Logstash	4	14757	July 6, 2017
Store XML to Index in ES using logstash Logstash	11	2988	April 13, 2018
Help with logstash and XML Logstash	5	1183	July 6, 2017
Indexing data in elasticsearch through logstash using xml files Logstash	5	370	December 25, 2018
Please give me easy sample to use xml filter Logstash	11	6195	July 6, 2017

Xml filter: create filter definition based on xsd

Related topics