Not able to parse custom logs having multi line xml

I have following log file,

5d563f04-b5d8-4b8d-b3ac-df26028c3719 SoapRequest CheckUserPassword 
<properties>
<hostname>crt-mon</hostname>
<date>2016.11.01</date>
<time>01:23:04 CET</time>
<release>11.6</release>
<version>2.1</version>
</properties>

and my conf file is,

input {
file {
path => "D:\mars.log"
type => "test-xml"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok{
match =>{
"message" =>"%{DATA:method_id} %{WORD:method_type} %{WORD:method} %{GREEDYDATA:data}"
}
}
xml {
store_xml => "false"
source => "data"
xpath => [
"/properties/hostname/text()", "hostname",
"/properties/date/text()", "date",
"/properties/time/text()", "time",
"/properties/release/text()", "release",
"/properties/version/text()", "version"
]
}
mutate {
rename => [
"[hostname][0]", "hostname",
"[date][0]", "date",
"[time][0]", "time",
"[release][0]", "release",
"[version][0]", "version"
]
}
}

output {
elasticsearch {
index => "find"
hosts => ["localhost:9200"]
}
stdout { codec => rubydebug }
}

When I'm running the conf file, I'm not getting the xml data, the xml data is only present inside the message tag, but when I keep the whole log file data in single line then I'm able to parse xml data properly. I don't want to keep data in single line, please help me.

my log file got messed up above, this is the formatted log file,

5d563f04-b5d8-4b8d-b3ac-df26028c3719 SoapRequest CheckUserPassword

crt-mon
2016.11.01
01:23:04 CET
11.6
2.1

Blockquote

Is the record shown the full file contents or is the record structure repeated like this?

5d563f04-b5d8-4b8d-b3ac-df26028c3719 SoapRequest CheckUserPassword 
<properties>
<hostname>crt-mon</hostname>
<date>2016.11.01</date>
<time>01:23:04 CET</time>
<release>11.6</release>
<version>2.1</version>
</properties>
5d563f04-b5d8-4b8d-b3ac-df26028c3719 SoapRequest CheckUserPassword 
<properties>
<hostname>crt-mon</hostname>
<date>2016.11.01</date>
<time>01:23:04 CET</time>
<release>11.6</release>
<version>2.1</version>
</properties>
5d563f04-b5d8-4b8d-b3ac-df26028c3719 SoapRequest CheckUserPassword 
<properties>
<hostname>crt-mon</hostname>
<date>2016.11.01</date>
<time>01:23:04 CET</time>
<release>11.6</release>
<version>2.1</version>
</properties>

It is not repeated, it contains only until first end tag.

Use Filebeat multiline. See this thread Input Json file

Its about JSON but the same pitfalls apply to XML.

Thanks for your help, but I'm not able to understand that link. can you provide me a example?
Also, i tried few things and now I'm able to parse the multiline xml data but the problem is i see the data only when i terminate the logstash terminal by pressing cntrl+c (using windows system), otherwise the screen stays on the line as follows,

[2017-11-16T16:40:57,638][INFO ][logstash.pipeline ] Pipeline main started
[2017-11-16T16:40:57,851][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

Previous discussions.

As you can see, this kind of thing is really hard to get 100% right. Most of the time the worst problem is that the very last character in the file is not a newline.

Another possibility is to use a python script preprocessor to read a whole file and append a minified XML string to another file that filebeat will tail.
A partial solution. python - Remove whitespaces in XML string - Stack Overflow

Thanks again for another reply.
As i said, I'm able to parse the whole xml data now but there is one small problem.
When i run the command- logstash -f testxml.CONF, from the command line, i get following lines.

D:\logstash-5.6.3\bin>logstash -f testxml.CONF
Picked up _JAVA_OPTIONS: -Xmx512M -Xms256M
Sending Logstash's logs to D:/logstash-5.6.3/logs which is now configured via log4j2.properties
[2017-11-16T16:40:51,289][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"D:/logstash-5.6.3/modules/fb_apache/configuration"}
[2017-11-16T16:40:51,320][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"D:/logstash-5.6.3/modules/netflow/configuration"}
[2017-11-16T16:40:54,080][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2017-11-16T16:40:54,080][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2017-11-16T16:40:54,374][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2017-11-16T16:40:54,593][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2017-11-16T16:40:54,609][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-", "version"=>50001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"default"=>{"_all"=>{"enabled"=>true, "norms"=>false}, "dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "include_in_all"=>false}, "@version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2017-11-16T16:40:54,609][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2017-11-16T16:40:57,090][INFO ][logstash.pipeline ] Starting pipeline {"id"=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>500}
[2017-11-16T16:40:57,638][INFO ][logstash.pipeline ] Pipeline main started
[2017-11-16T16:40:57,851][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

But the screen stays there even though i have provided stdout in my conf file, the moment i press cntrl+c, the pipeline stops and only after that i get my output like below,

[2017-11-16T16:40:57,638][INFO ][logstash.pipeline ] Pipeline main started
[2017-11-16T16:40:57,851][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2017-11-16T16:43:48,191][WARN ][logstash.runner ] SIGINT received. Shutting down the agent.
[2017-11-16T16:43:48,206][WARN ][logstash.agent ] stopping pipeline {:id=>"main"}
{
"path" => "D:\check.xml",
"hostname" => "KHAN",
"@timestamp" => 2017-11-16T11:13:48.831Z,
"@version" => "1",
"host" => "01HW536446",
"message" => "\r\nKHAN\r",
"type" => "test-xml",
"tags" => [
[0] "multiline"
]
}
Terminate batch job (Y/N)? y

I need to know why the output is not coming before pressing cntrl+c.
Please help.

what is the multiline codec configuration?

my input looks like this,

input {
file {
path => "D:\mars.log"
type => "test-xml"
start_position => "beginning"
sincedb_path => "nul"
codec => multiline
{
pattern => "^"
negate => true
what => "previous"
}
}
}

You may need auto flush.
https://www.elastic.co/guide/en/logstash/current/plugins-codecs-multiline.html#plugins-codecs-multiline-auto_flush_interval

1 Like

Thank you so much for your help, I'm able to parse the data completely and properly.

1 Like

Hi, I need one help.
I want to know what changes i need to make in CONF file if i have repeated xml tags in my xml file as below,

<properties>
<hostname>crt-moner</hostname>
<date>2016.11.02</date>
<time>01:28:04 CET</time>
<release>11.7</release>
<version>2.2</version>
</properties>
<properties>
<hostname>crt-monerqq</hostname>
<date>2017.11.02</date>
<time>05:28:04 CET</time>
<release>12.7</release>
<version>2.4</version>
</properties>

Try this:

input {
  file {
    path => "D:\mars.log"
    type => "test-xml"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "^<properties>"
      negate => true
      what => "previous"
    }
  }
}
1 Like

Thanks alot, it worked :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.