I am trying to parse an XML file in Logstash. I want to use XPath to do the parsing of documents in XML. So when I run my config file the data loads into elasticsearch but It is not in the way I want to load the data. The data loaded in elasticsearch is each line in xml document
Yes, because a file input creates one event for each line of the file. The fix depends on the exact problem you are trying to solve. Do you always want to capture the entire file as a single event? Are there ever two stations in a file (not 2 station elements, I can see there are, are there ever 2 stations elements)? Are you only dealing with stations as the outermost thingy?
Append some pattern that you are confident will not occur in the XML (easy in this case) then use a stdin input with a multiline codec. That should capture the XML as a single event, which you can start attacking with an xml filter.
(cat file.xml; echo "Monsieur Spalanzani n'aime pas la musique") | ./logstash -f ...
input{
stdin {
codec => multiline {
pattern => "^Monsieur Spalanzani n'aime pas la musique"
negate => "true"
what => "previous"
}
}
}
Did you try exactly what I suggested? There is no out-of-the-box codec that captures the entire contents of a file and inputs it as an event. You have to append a marker to the file and tell the multiline codec to look for that marker.
I guess I didn't understand the suggestion properly. What I got in form your comment was to use a pattern which will not repeat and can capture the file as a complete event. So all the data I have is under the stations xml tag. Can you suggest me which pattern should I use then?
Please post text rather than images. I am not going to go and OCR that in order to be able to read it.
I am not asking for the logstash log file. I am looking for what got written to stdout. It will be pretty-printed, like this (but with different data, obviously)
I got the similar output as you pasted above. There is one thing that didn't occur as I planned.
Whenever i run the config file I do not get the pretty printed response immediately. I only get the logstash is running output for few hours. But when my laptop goes to sleep or i restart multiple times I suddenly get the pretty response as the message & multi line, host name and so on.
• Value type is string
• There is no default value for this setting.
Path of the sincedb database file (keeps track of the current position of monitored log files) that will be written to disk. The default will write sincedb files to <path.data>/plugins/inputs/file NOTE: it must be a file path and not a directory path
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.