Unable to parse XML through Logstash

I am new to ELK stack.
Trying to parse below XML code snippet through Logstash:

<?xml version="1.0"?> Gambardella, Matthew XML Developer's Guide Computer 44.95 2000-10-01 An in-depth look at creating applications with XML. Ralls, Kim Midnight Rain Fantasy 5.95 2000-12-16 A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.

Below is my config:

input {
file {
path => "/home/testuser/test/test.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => multiline
{
pattern => "^<?book .*>"
negate => true
what => "previous"
}
}
}

filter
{
xml {
source => "message"
target => "parsed"
}
split {
field => "[parsed][book]"
add_field => {
bookAuthor => "%{[parsed][book][author]}"
title => "%{[parsed][book][title]}"
genre => "%{[parsed][book][genre]}"
price => "%{[parsed][book][price]}"
publish_date => "%{[parsed][book][publish_date]}"
description => "%{[parsed][book][description]}"
}
}
}

output {
Elasticsearch {
hosts => "127.0.0.1"
index => "xmlnew-test"
codec => rubydebug
}
}

Although Logstash runs without any errors , no index is created.

I changed user and group permissions for my xml file and restarted Logstash but this is not helping.

[testuser@test ~]$ ls -la /home/testuser/test/test.xml
-rwxrwxrwx. 1 logstash logstash 4405 Jun 14 10:23 /home/testuser/test/test.xml

Kindly suggest.

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>

Pasted xml code snippet above.

Hello,

What does the logstash logs says when picking up the conf or starting to read your file ?

This is what I see on the logs:

Jun 14 10:53:31 test logstash[14253]: [2022-06-14T10:53:31,519][INFO ][logstash.outputs.Elasticsearch][main] Elasticsearch version determined (7.17.4) {:es_version=>7}
Jun 14 10:53:31 test logstash[14253]: [2022-06-14T10:53:31,521][WARN ][logstash.outputs.Elasticsearch][main] Detected a 6.x and above cluster: the type event field won't be used to determine the document _type {:es_version=>7}
Jun 14 10:53:31 test logstash[14253]: [2022-06-14T10:53:31,593][INFO ][logstash.outputs.Elasticsearch][main] Config is not compliant with data streams. data_stream => auto resolved to false
Jun 14 10:53:31 test logstash[14253]: [2022-06-14T10:53:31,604][INFO ][logstash.outputs.Elasticsearch][main] Config is not compliant with data streams. data_stream => auto resolved to false
Jun 14 10:53:31 test logstash[14253]: [2022-06-14T10:53:31,737][INFO ][logstash.outputs.Elasticsearch][main] Using a default mapping template {:es_version=>7, :ecs_compatibility=>:disabled}
Jun 14 10:53:32 test logstash[14253]: [2022-06-14T10:53:32,275][INFO ][logstash.javapipeline ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/etc/logstash/conf.d/test.conf"], :thread=>"#<Thread:0x7c4680c4 run>"}
Jun 14 10:53:33 test logstash[14253]: [2022-06-14T10:53:33,210][INFO ][logstash.javapipeline ][main] Pipeline Java execution initialization time {"seconds"=>0.93}
Jun 14 10:53:33 test logstash[14253]: [2022-06-14T10:53:33,321][INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"}
Jun 14 10:53:33 test logstash[14253]: [2022-06-14T10:53:33,378][INFO ][filewatch.observingtail ][main][f22f45860093b5e6671036e486fe4177ee0847e7bf0d38553424c322662bf783] START, creating Discoverer, Watch with file and sincedb collections
Jun 14 10:53:33 test logstash[14253]: [2022-06-14T10:53:33,411][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>}

Your multiline codec is waiting for a line that matches /^<?book .*>/. Once it sees one it will flush an event onto the pipeline. You probably need to change the pattern, and also add the auto_flush_interval option to the codec, otherwise you will never get an event for the last book in the catalog.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.