I am looking for some recommendations, where I have an XML file which is nested and needs to be an input in Logstash.
My xml has the following structure:
< subsystem>
< component>
< Parameter>
< /Parameter>
< Parameter>
< /Parameter>
< /component>
< component2>
< Parameter>
< /Parameter>
< Parameter>
< /Parameter>
< /component2>
< /subsystem>
Basically what I am trying to achieve is to get all parameters into ElasticSearch and add additional fields, with what component and subsystem they belong to. I have about 10 different subsystems in the file, each has different components, where each have different parameters. Whats the best way to approach this?
So far I have a Logstash file, which just lists all the fields, but parameters are not linked to their subsystem or components.
It is not clear what you want to achieve, so let me say what I would do with data like that. The first thing to do is use a multiline input codec to join all the parts of a <subsystem together>. That's going to look something like this if subsystems are separated by a blank line.
The message will have newlines embedded in it, so they have to be mutated away. Note that there is no fancy quoting scheme to get a newline into a string, you just put the open and close quotes on different lines.
mutate {
gsub => [ "message", "
", "" ]
}
Then we need to parse the XML, which looks like this
Basically what I have is an xml file, which is nested and has the following values:
-a number of subsystems
-inside subsystem, a number of components
-inside components a number of parameters
This repeats, so i configured something like this in input in Logstash
input {
file {
path => "/files/pipo.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => multiline {
pattern => "<(subsystem|component|parameters)"
negate => true
what => "previous"
}
}
}
This works well for me, except it divides all of these results into a separate json outputs, where i have a separate json for subsystem, components and then parameters- its too split up.
I want to achieve a single json per subsystem, where inside i have all the values for components and parameters.
I want to have it represented that way as, components and parameters refer to a specific subsystem and i need to group data by this.
Any idea how to achieve this?
What would be even better is to have a json for each parameter, which would have the component that it belongs to, as well as subsystem to which it belongs to.
for example
{parameter= test1243
component= test2
subsystem= pipo
}
{parameter= 19172
component=test2
subsystem=pipo
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.