Parse basic xml file

Hi All,

Could you please help me? I'm in trouble on parsing a basic xml file.

This is an example of my xml:

<hotel><name>Hotel 1</name><rooms>22</rooms><kitchens>1</kitchens><restaurants>2</restaurants></hotel>
<hotel><name>Hotel 2</name><rooms>8</rooms><kitchens>0</kitchens><restaurants>0</restaurants></hotel>

And This is my pipeline:

input {
  file {
    path => "C:/elastic_d/logstash/bin/data/data_xml.txt"
    start_position => "beginning"
    sincedb_path => "NUL"
    type => "xml"
  }
}

filter {
  xml { source => "message" store_xml => true target => "theXML" force_array => false }
}


output {
  elasticsearch { 
    hosts => ["localhost:9200"]
    index => "aa"
  }
}

But on Kibana Discover I can see all data inside one line and the second line is not parsed.
I would like to see 2 different documents, the first one for "Hotel 1" and second one for "Hotel 2", with same fields: Name, Rooms, Kitchens and Restaurants (but different values).

Could anyone help me please?

Question, are those lines within a larger XML document or completely separate / independent lines?

If they are within a single document, can you show us the enclosing XML?

If they are separate lines, is there a carriage return/ line feed after the second line to make sure it gets processed

EDIT I just ran your conf assuming those were separate / independent lines and it worked and I got both lines process if there is a newline after the 2nd line ... if not I only got 1 line.

Hi @stephenb , thanks a lot for your help and also for the test!!
Yes, I have both of tags inside the same line. There is no character (like "\r" "\n" or anything else) that would allow me to select a new line. Can I still parse it as 2 separate lines? I don't know if it's nonsense, but online I found the "multiline" plugin (but I think I have not configured well)

You cannot have two XML documents on the same line. The xml filter should be logging "attempted adding second root element to document". You can split it into two lines using

    mutate { gsub => [ "message", "<hotel>", "
<hotel>" ] }
    split { field => "message" }

Hi, thanks!
Yes, sorry, I have a file with 2 different lines, but I can't see the new index on my Kibana.

This is my "new" pipeline:

input {
  file {
    path => "C:/elastic_d/logstash/bin/data/hotel.txt"
    start_position => "beginning"
    sincedb_path => "NUL"
    type => "xml"
  }
}

filter {
  xml { source => "message" store_xml => true target => "theXML" force_array => false }
  mutate { gsub => [ "message", "<hotel>", "<hotel>" ] }
  split { field => "message" }
}


output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "cc"
  }
}

And this is the log:

[2022-05-21T18:55:30,232][INFO ][logstash.runner          ] Log4j configuration path used is: C:\elastic_d\logstash\config\log4j2.properties
[2022-05-21T18:55:30,241][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.16.2", "jruby.version"=>"jruby 9.2.20.1 (2.5.8) 2021-11-30 2a2962fbd1 OpenJDK 64-Bit Server VM 11.0.13+8 on 11.0.13+8 +indy +jit [mswin32-x86_64]"}
[2022-05-21T18:55:30,350][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2022-05-21T18:55:32,795][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}
[2022-05-21T18:55:33,737][INFO ][org.reflections.Reflections] Reflections took 111 ms to scan 1 urls, producing 119 keys and 417 values 
[2022-05-21T18:55:37,082][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["C:/elastic_d/logstash/bin/pipeline.conf"], :thread=>"#<Thread:0x688754d9 run>"}
[2022-05-21T18:55:38,258][INFO ][logstash.javapipeline    ][main] Pipeline Java execution initialization time {"seconds"=>1.17}
[2022-05-21T18:55:38,338][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2022-05-21T18:55:38,389][INFO ][filewatch.observingtail  ][main][6852299b3d9de52d8364e03ba907d3baf6fd2cecdbc3e582ad937128be15951b] START, creating Discoverer, Watch with file and sincedb collections
[2022-05-21T18:55:38,429][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}

but I can't see the new index... and I don't know why :frowning:

If the file has the elements on two separate lines then you do not need the mutate and split. If you do need the mutate then you need that literal newline embedded in the replacement string.

Without any plugin I can see it in the sam document...

Let me share a screenshot of my source file:


We have 2 different lines.

The pipeline (without mutate and split):

input {
  file {
    path => "C:/elastic_d/logstash/bin/data/hotel.txt"
    start_position => "beginning"
    sincedb_path => "NUL"
    type => "xml"
  }
}

filter {
  xml { source => "message" store_xml => true target => "theXML" force_array => false }
}


output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "ee"
  }
}

My Kibana:


1 hit

:sob: :sob: :sob:

Thanks a lot for your help

@Ely_96 Put a return / newline after the second line...
Logstash reads files line by line... since no newline on 2nd line it will not process it until there is a newline.... because until a newline the current line may not be finished being written too...

I can see there is no newline at the end on line 2... put one in... and it will work

Thanksssssssss!!!!! :star_struck:
Yes, now it works (I just loaded 6 Hotel)!!!

But the pipeline should be able to load an xml received by filebeat, so now I insert a new line after the last one, but when the process is active I will not be able to insert manually a new line. Can I handle this in any way?

Whatever process is writing that file needs to end the last line with a newline...

Filebeat is also "Line Oriented" until a line has a newline at the end... that line is not finished... that is pretty common in any line oriented processing.

Thanks a lot for your answer!!
Probably the colleague who gave me the first log only made copy / paste of the two lines without paying attention to the last empty line.
I will check it :slight_smile:

Thank you so much! :blush:

1 Like

Add just fieldnames and remove the message and theXML fields.

	mutate {
		add_field => {
		"hotelname" => "%{[theXML][name]}"
		"rooms" => "%{[theXML][rooms]}"
		"kitchens" => "%{[theXML][kitchens]}"
		"restaurants" => "%{[theXML][restaurants]}"
		}
	}
	mutate {  remove_field => ["message", "theXML"]}
{
     "@timestamp" => 2022-05-21T22:36:43.639Z,
      "hotelname" => "Hotel 1",
          "rooms" => "22",
    "restaurants" => "2",
       "@version" => "1",
       "kitchens" => "1"
}
{
     "@timestamp" => 2022-05-21T22:36:43.658Z,
      "hotelname" => "Hotel 2",
          "rooms" => "18",
    "restaurants" => "0",
       "@version" => "1",
       "kitchens" => "0"
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.