Logstash xml input configuration: for multiple documents

rahulnama · July 29, 2020, 8:07pm

Hi Team

I'm using http_poller to poll an end point which gives xml data as response. I'm trying to send this xml data to elasticsearch.

But when I run logstash, I see logstash is failing. Please have a look at below config.

xmldata:

few line of my xml data:

>     <?xml version="1.0" encoding="UTF-8"?>
>     <feed xmlns="http://www.w3.org/2005/Atom">
>        <generator version="1.0">Alfresco (1.0)</generator>
>        <link rel="self" href="links" />
>        <id>random_id</id>
>        <title>Activities Site</title>
>        <updated>2020-07-29T12:53:16.000-07:00</updated>
>        <entry xmlns='http://www.w3.org/2005/Atom'>
>           <title type="html"><overview></title>
>           <link rel="alternate" type="text/html" href="random link" />
>           <id>249,535,933</id>
>           <updated>2020-07-29T12:53:16.000-07:00</updated>
>           <summary type="html">
>              <![DATA[<a href="random link</a> downloaded document <a href="random link">Overview</a>]]>
>           </summary>
>           <author>
>              <name>name</name>
>              <uri>random</uri>
>           </author>
>        </entry>
>        <entry xmlns='http://www.w3.org/2005/Atom'>
>           <title type="html"><random></title>
>           <link rel="alternate" type="text/html" href="randomuri" />
>           <id>249,535,867</id>
>           <updated>2020-07-29T12:53:10.000-07:00</updated>
>           <summary type="html">
>              <![CDATA[<a href="random">Name</a> download <a href="random">intro</a>]]>
>           </summary>
>           <author>
>              <name>Name</name>
>              <uri>random</uri>
>           </author>
>        </entry>

Logstash.conf:

input 
{
	http_poller 
	{
		urls => 
		{

		test1 =>
				{		
				url=>"randomhost"
				method => get
                user => "*********"
                password => "*******"
                headers => {
                   "Content-Type" => "text/xml; charset=UTF-8"
                   }
				}
		}
		request_timeout => 60
		schedule => { cron => "* * * * * UTC"}
        
	}
}
filter {
  xml { 
  source => "message" 
  target => "theXML" 
  
  }
}

#output { stdout { codec => rubydebug } }

output {

    elasticsearch {
      index => "logstash-xmldata"
      hosts => "http://elasticsearchhost:80"
      user => "****"
      password => "******"
    }
  }

output:

> [2020-07-29T20:00:02,148][WARN ][logstash.outputs.elasticsearch][main][41e2884444551e51d0256ad578d1476c2186e932e0995e3ce551bbd4c4286a6a] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-xmldata", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x11bd3dd9>], :response=>{"index"=>{"_index"=>"logstash-xmldata", "_type"=>"_doc", "_id"=>"r1MpnHMBz8okLa_0Chk8", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [theXML.title] tried to parse field [null] as object, but found a concrete value"}}}}

please suggest.

Badger · July 29, 2020, 8:10pm

The problem appears to be with [theXML][title]

<title>Activities Site</title>

that is a string (a "concrete value") but the mapping in elasticsearch expects it to be an object.

Check the mapping in elasticsearch.

Read this post and then this post.

rahulnama · July 29, 2020, 8:38pm

Hi @Badger

Got it. I'm able to ingest data to kibana by following the details in suggested posts.

But in kibana I see all the data in one xml filed ? Any suggestions on this ?

Badger · July 29, 2020, 9:05pm

logstash will have placed all of the parsed XML inside the top-level theXML field. If you want the object to be moved to the top level you can use a ruby filter, like this.

rahulnama · July 29, 2020, 9:40pm

Hi @Badger

This is the kibana output without ruby filter. all the data is under theXML.entry field.

[

]

This is the kibana output with the below ruby filter

ruby {
code => '
event.get("theXML").each { |k, v|
event.set(k,v)
}
event.remove("theXML")
'
}

. the data is under entry field.

Do I need make any changes to the ruby filter ? something like event.get(theXML.entry) .
Also, how about entry.updated ? will that be parsed as well

Please suggest

Thank you

Badger · July 29, 2020, 10:20pm

Well, the sample data you posted is not valid XML, and I suspect the structure is different as you get through more entries.

You might want to use the 'force_array => false' option on the xml filter.

Since there are multiple <entry> elements that is always going to be an array. You might want to use a split filter to break those up into separate events. Maybe not, depends on your use case.

How you end up with a entry.updated array I cannot guess.

rahulnama · July 29, 2020, 10:32pm

I posted the first few lines of the data. So, it looks like invalid. i tried converting to json (using external editors) and it worked

let me try these options and see. However, I'm still wondering about entry.updated and other similar fields.

Will let you know If i find anything interesting.

Thanks
Rahul

rahulnama · July 30, 2020, 3:34pm

Hi @Badger

Following filter config using split worked well.

filter {
xml {
source => "message"
target => "theXML"
force_array => false

}
ruby {
code => '
event.get("theXML").each { |k, v|
event.set(k,v)
}
event.remove("theXML")
'
}
split {
field => "entry"
remove_field => "message"
}
}

Data in Kibana is good but I see _jsonparsefailure tag. Is there anyway I can understand what is failing ?

Jenni · July 30, 2020, 4:01pm

I think that comes from your input. You didn't set the codec parameter, so it tried to use its default:

rahulnama · July 30, 2020, 4:16pm

oh yea got it. Makes sense. @Jenni

Is there a way to specify xml ? I didnt see it in documentation.

Jenni · July 30, 2020, 4:45pm

I didn't see anything either. I think you can just use plain for this as you already have an xml filter anyway.

rahulnama · July 30, 2020, 4:48pm

sure @Jenni

Thank you

rahulnama · July 31, 2020, 4:30pm

Hi @Badger @Jenni

The above conf worked but message (field) is adding to every document (in es) with all the data. Any inputs to avoid this ?

Jenni · July 31, 2020, 4:56pm

(Edit: There was a wrong test and assumption that split doesn't call remove_field if there was only one entry. But Badger proved me wrong below. This is long and unnessecarry, so I am getting rid of it. Have a look at the edit history of this post, if you are interested in my idiotism )

If you move the remove_field option to a separate mutate filter, it should work.

Badger · July 31, 2020, 5:01pm

I would add the remove_field => [ "message" ] to the xml filter, so that it is only removed if it is successfully parsed.

@Jenni, the split filter will not decorate the event (i.e. filter_matched is not called) if the field is a string that does not contain the terminator.

Jenni · July 31, 2020, 5:05pm

Ah. Sorry. Thanks. I had wrongfully assumed that add_field would keep my array as an array.

(But a feed with only one entry could still cause problems with split because it would be a hash instead of an array, wouldn't it?)

rahulnama · July 31, 2020, 6:59pm

Adding the remove_field as a separate filter worked as well. However, adding it in xml filter would make more sense.

Thank you both

Badger · July 31, 2020, 7:35pm

If the field were a hash you would get

logger.warn("Only String and Array types are splittable. field:#{@field} is of type = #{original_value.class}")

system · August 28, 2020, 7:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem to get xml response with http_poller plugin Logstash	1	249	March 9, 2021
Logstash doesn't work with XML file Logstash	2	368	September 25, 2019
Data sent to elasticsearch only after SIGTERM Logstash	3	379	September 18, 2018
Struggling to parse XML using Logstash Logstash	1	279	October 13, 2020
Unable to index nested xml rest api data Logstash	11	488	May 1, 2019

Logstash xml input configuration: for multiple documents

Related topics