Unable to parse XML through Logstash

Bineeta_Das_IN · June 14, 2022, 11:29am

I am new to ELK stack.
Trying to parse below XML code snippet through Logstash:

<?xml version="1.0"?> Gambardella, Matthew XML Developer's Guide Computer 44.95 2000-10-01 An in-depth look at creating applications with XML. Ralls, Kim Midnight Rain Fantasy 5.95 2000-12-16 A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.

Below is my config:

input {
file {
path => "/home/testuser/test/test.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => multiline
{
pattern => "^<?book .*>"
negate => true
what => "previous"
}
}
}

filter
{
xml {
source => "message"
target => "parsed"
}
split {
field => "[parsed][book]"
add_field => {
bookAuthor => "%{[parsed][book][author]}"
title => "%{[parsed][book][title]}"
genre => "%{[parsed][book][genre]}"
price => "%{[parsed][book][price]}"
publish_date => "%{[parsed][book][publish_date]}"
description => "%{[parsed][book][description]}"
}
}
}

output {
Elasticsearch {
hosts => "127.0.0.1"
index => "xmlnew-test"
codec => rubydebug
}
}

Although Logstash runs without any errors , no index is created.

I changed user and group permissions for my xml file and restarted Logstash but this is not helping.

[testuser@test ~]$ ls -la /home/testuser/test/test.xml
-rwxrwxrwx. 1 logstash logstash 4405 Jun 14 10:23 /home/testuser/test/test.xml

Kindly suggest.

Bineeta_Das_IN · June 14, 2022, 11:31am

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>

Bineeta_Das_IN · June 14, 2022, 11:32am

Pasted xml code snippet above.

grumo35 · June 14, 2022, 12:47pm

Hello,

What does the logstash logs says when picking up the conf or starting to read your file ?

Bineeta_Das_IN · June 14, 2022, 1:11pm

This is what I see on the logs:

Jun 14 10:53:31 test logstash[14253]: [2022-06-14T10:53:31,519][INFO ][logstash.outputs.Elasticsearch][main] Elasticsearch version determined (7.17.4) {:es_version=>7}
Jun 14 10:53:31 test logstash[14253]: [2022-06-14T10:53:31,521][WARN ][logstash.outputs.Elasticsearch][main] Detected a 6.x and above cluster: the type event field won't be used to determine the document _type {:es_version=>7}
Jun 14 10:53:31 test logstash[14253]: [2022-06-14T10:53:31,593][INFO ][logstash.outputs.Elasticsearch][main] Config is not compliant with data streams. data_stream => auto resolved to false
Jun 14 10:53:31 test logstash[14253]: [2022-06-14T10:53:31,604][INFO ][logstash.outputs.Elasticsearch][main] Config is not compliant with data streams. data_stream => auto resolved to false
Jun 14 10:53:31 test logstash[14253]: [2022-06-14T10:53:31,737][INFO ][logstash.outputs.Elasticsearch][main] Using a default mapping template {:es_version=>7, :ecs_compatibility=>:disabled}
Jun 14 10:53:32 test logstash[14253]: [2022-06-14T10:53:32,275][INFO ][logstash.javapipeline ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/etc/logstash/conf.d/test.conf"], :thread=>"#<Thread:0x7c4680c4 run>"}
Jun 14 10:53:33 test logstash[14253]: [2022-06-14T10:53:33,210][INFO ][logstash.javapipeline ][main] Pipeline Java execution initialization time {"seconds"=>0.93}
Jun 14 10:53:33 test logstash[14253]: [2022-06-14T10:53:33,321][INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"}
Jun 14 10:53:33 test logstash[14253]: [2022-06-14T10:53:33,378][INFO ][filewatch.observingtail ][main][f22f45860093b5e6671036e486fe4177ee0847e7bf0d38553424c322662bf783] START, creating Discoverer, Watch with file and sincedb collections
Jun 14 10:53:33 test logstash[14253]: [2022-06-14T10:53:33,411][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>}

Badger · June 14, 2022, 3:56pm

Your multiline codec is waiting for a line that matches /^<?book .*>/. Once it sees one it will flush an event onto the pipeline. You probably need to change the pattern, and also add the auto_flush_interval option to the codec, otherwise you will never get an event for the last book in the catalog.

system · July 12, 2022, 3:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.