How to setup logstash to send whole xml event log to elasticsearch

I am trying to setup logstash to send xml event logs to elasticseach, however it is inserting each line as a separate entry into es, I tried using multiline codec but its buffering that many lines and inserting into es ie if my event 1 has 100 line and event 2 has 500 lines then it takes 100 line from first event and 400 from second event and combines it and inserts into es. And for the next insert it wants until another 400 lines are received.

I just want one xml output log event to be inserted into es as is - what is that I am missing in my config?

 input {
      file {
        path => "/opt/appserver/application*.log"
        start_position => "beginning"
      }
    }
    output {
      elasticsearch { hosts => ["http://ip.address:port"] }
      stdout { codec => "rubydebug" }
    }
1 Like

If the XML is spread across multiple lines you will need to use a multiline codec to combine the lines of each XML document into one event. You can set the auto_flush_interval to cause it to flush before the next document arrives.

Thank you for the response, I dont think that will work for me as there could be 100s of events in a second, so I guess its going to combine all events in 1 second time span. Atleast I think ur reponse points me in right direction, I may need to use some kind of pattern or other options it seems.

I am surprised there is no plug-in of some sort that takes an xml event and dumps into to es as is- not sure my lack of knowledge on logstash config is missing something that may be very simple.

May be some option without the multiline codec as it seems its going to come on the way always?

There is a Logstash filter to process XML docs.

    filter {
      xml {
        source => "message"
      }
    }

Just with an xml filter mentioned above it inserts each line of the xml log into es as 1 record and I want the whole xml ie I want the following record to appear as one event in es instead it appears as 6 records, each line being 1 record. With multiline it buffers upto the multiline limit and records it as 1 record, i dont want that either.

<data>
 <one>
  <key1>Value1</key1>
  <key2>Value2</key2>
 </one>
</data>

Then you have misconfigured the multiline filter. However, since you have given no indication of how you are configuring it we cannot help you.

Here is my configuration - (with the line limit it buffers upto that many lines if no value given then it takes the 500 default and buffers that many lines)

input {
  file {
    path => "/opt/app*.log"
    start_position => "beginning"
    codec => multiline
    {
      pattern => "^<\?data.*\>"
      negate => true
      what => "previous"
      max_lines => 10000
    }
  }
}
filter {
  xml {
   store_xml => false
   source => "message"
  }
 }
output {
  elasticsearch { hosts => ["ip:port"] }
  stdout { codec => "rubydebug" }
}

Once that codec sees a line that starts with <?data it should accumulate lines until it sees another one. What do the contents of the file look like?

If you set store_xml to false and do not use the xpath option then the xml filter is a no-op.

for example the content has 2 events, i want this inserted as 2 records in es

<data>
 <one>
  <key1>Value1</key1>
  <key2>Value2</key2>
 </one>
</data>

<data>
 <one>
  <key1>Value1</key1>
  <key2>Value2</key2>
  <key3>Value3</key1>
  <key4>Value4</key2>  
 </one>
</data>

Try

pattern => "^<data>"

Thank you so much, it does help, atleast i see the first event in the es, but looks like its waiting for another event ie 3rd one in the log to insert the second one (i have only 2 events in the log). I think I may be able to figure it out.

That is why you should use auto_flush_interval

Thank you, sometimes I do see one event with just one line

</data>

(i am not showing the full log -probably not necessary as its too big and may not help ) and the previous event missing this line, also I fixed the pattern to look the first line with xml etc

I dont understand the reason for this behavior (the log event could be big like may b 200-300 lines but I dont think it will overlap between 1 sec, not sure though),

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE application SYSTEM 'http\://machine/application.dtd'>

<data>
 <one>
  <key1>Value1</key1>
  <key2>Value2</key2>
 </one>
</data>

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE application SYSTEM 'http\://machine/application.dtd'>

<data>
 <one>
  <key1>Value1</key1>
  <key2>Value2</key2>
  <key3>Value3</key1>
  <key4>Value4</key2>  
 </one>
</data>

Try

codec => multiline { pattern => '^<\?xml ' negate => true what => previous auto_flush_interval => 1 }

Thank you again, I have the exact same pattern, wondering if its something todo with the application spitting the log file ie at some instances may be its not written to the file until the end of line as I do see a pattern, application has request and reponse logs and its always the last last line of the response that gets added as a new line.

check out this github repo, it might help you
https://github.com/kevinfealey/logstash-multiline-xml-parsing-example