input{
file{
path => "/usr/share/logstash/bin/myXML.xml"
start_position => beginning
}
}
#filter{
# I DON'T KNOW WHAT TO PUT HERE
#}
output{
elasticsearch{
hosts => ["localhost:9200"]
user => elastic
password => changeme
}
stdout{}
}
I am confused as to how I should make my config file with the filters and what not. The XML file will have nearly hundreds of fields (hence the ". . . " and some have sub-fields (sort of a Object Oriented way of encapsulating the data within other data like a Class in Java). Is there a way to dynamically parse the XML file so I don't have to manually define the fields and the contents of them?
You should use a multiline codec on the input to consume the entire file as a single event. There are many threads about how to do that. Then you can use a logstash xml filter to parse the XML
Ok. My bad, I didnt see your configuration.
With your configuration, logstash will read each line as a new event. To fix that use multiline codec. Multiline codec will aggregate multiple lines into a single log event, in this case it will create one xml file. There are plenty of sample around multiline please go through them. As I am on mobile device unable to give you exact config.
Secondly, once we aggregated and created a single xml file use ruby code.
@Badger does the Logstash XML filter work to dynamically parse out every tag in the XML? In other words, do I have to specify each field in the filter that exists in my XML?
@Suman_Reddy1 Thank you. I will take a look into the Ruby and Nokogiri. If you have any more info once you have time, I'd really appreciate that too. I'll try and keep learning.
Below is a recursive way of iterating all elements in an xml
ruby {
code => "
require 'nokogiri'def iterative(ele)
ele.children.each do |tempNode|
if tempNode.text?
puts tempNode.content
else
iterative(tempNode)
end
end
end
xml_doc = Nokogiri::XML.parse(event.get('xml-data'))
iterative(xml_doc)"
}
Above is the sample, which we used to parse xml do some inline masking on the data. This should give you some insight on xml processing. If you dont have to do much manipulation on XML, I would suggest Badger solution rather than this.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.