rahulnama
(Rahul Nama)
July 29, 2020, 8:07pm
1
Hi Team
I'm using http_poller to poll an end point which gives xml data as response. I'm trying to send this xml data to elasticsearch.
But when I run logstash, I see logstash is failing. Please have a look at below config.
xmldata:
few line of my xml data:
> <?xml version="1.0" encoding="UTF-8"?>
> <feed xmlns="http://www.w3.org/2005/Atom">
> <generator version="1.0">Alfresco (1.0)</generator>
> <link rel="self" href="links" />
> <id>random_id</id>
> <title>Activities Site</title>
> <updated>2020-07-29T12:53:16.000-07:00</updated>
> <entry xmlns='http://www.w3.org/2005/Atom'>
> <title type="html"><overview></title>
> <link rel="alternate" type="text/html" href="random link" />
> <id>249,535,933</id>
> <updated>2020-07-29T12:53:16.000-07:00</updated>
> <summary type="html">
> <![DATA[<a href="random link</a> downloaded document <a href="random link">Overview</a>]]>
> </summary>
> <author>
> <name>name</name>
> <uri>random</uri>
> </author>
> </entry>
> <entry xmlns='http://www.w3.org/2005/Atom'>
> <title type="html"><random></title>
> <link rel="alternate" type="text/html" href="randomuri" />
> <id>249,535,867</id>
> <updated>2020-07-29T12:53:10.000-07:00</updated>
> <summary type="html">
> <![CDATA[<a href="random">Name</a> download <a href="random">intro</a>]]>
> </summary>
> <author>
> <name>Name</name>
> <uri>random</uri>
> </author>
> </entry>
Logstash.conf:
input
{
http_poller
{
urls =>
{
test1 =>
{
url=>"randomhost"
method => get
user => "*********"
password => "*******"
headers => {
"Content-Type" => "text/xml; charset=UTF-8"
}
}
}
request_timeout => 60
schedule => { cron => "* * * * * UTC"}
}
}
filter {
xml {
source => "message"
target => "theXML"
}
}
#output { stdout { codec => rubydebug } }
output {
elasticsearch {
index => "logstash-xmldata"
hosts => "http://elasticsearchhost:80"
user => "****"
password => "******"
}
}
output:
> [2020-07-29T20:00:02,148][WARN ][logstash.outputs.elasticsearch][main][41e2884444551e51d0256ad578d1476c2186e932e0995e3ce551bbd4c4286a6a] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-xmldata", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x11bd3dd9>], :response=>{"index"=>{"_index"=>"logstash-xmldata", "_type"=>"_doc", "_id"=>"r1MpnHMBz8okLa_0Chk8", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [theXML.title] tried to parse field [null] as object, but found a concrete value"}}}}
please suggest.
Badger
July 29, 2020, 8:10pm
2
The problem appears to be with [theXML][title]
<title>Activities Site</title>
that is a string (a "concrete value") but the mapping in elasticsearch expects it to be an object.
Check the mapping in elasticsearch.
Read this post and then this post.
1 Like
rahulnama
(Rahul Nama)
July 29, 2020, 8:38pm
3
Hi @Badger
Got it. I'm able to ingest data to kibana by following the details in suggested posts.
But in kibana I see all the data in one xml filed ? Any suggestions on this ?
Badger
July 29, 2020, 9:05pm
4
logstash will have placed all of the parsed XML inside the top-level theXML field. If you want the object to be moved to the top level you can use a ruby filter, like this .
rahulnama
(Rahul Nama)
July 29, 2020, 9:40pm
5
Hi @Badger
This is the kibana output without ruby filter. all the data is under theXML.entry field.
[
]
This is the kibana output with the below ruby filter
ruby {
code => '
event.get("theXML").each { |k, v|
event.set(k,v)
}
event.remove("theXML")
'
}
. the data is under
entry field .
Do I need make any changes to the ruby filter ? something like event.get(theXML.entry) .
Also, how about entry.updated ? will that be parsed as well
Please suggest
Thank you
Badger
July 29, 2020, 10:20pm
6
Well, the sample data you posted is not valid XML, and I suspect the structure is different as you get through more entries.
You might want to use the 'force_array => false' option on the xml filter.
Since there are multiple <entry> elements that is always going to be an array. You might want to use a split filter to break those up into separate events. Maybe not, depends on your use case.
How you end up with a entry.updated array I cannot guess.
rahulnama
(Rahul Nama)
July 29, 2020, 10:32pm
7
I posted the first few lines of the data. So, it looks like invalid. i tried converting to json (using external editors) and it worked
let me try these options and see. However, I'm still wondering about entry.updated and other similar fields.
Will let you know If i find anything interesting.
Thanks
Rahul
rahulnama
(Rahul Nama)
July 30, 2020, 3:34pm
8
Hi @Badger
Following filter config using split worked well.
filter {
xml {
source => "message"
target => "theXML"
force_array => false
}
ruby {
code => '
event.get("theXML").each { |k, v|
event.set(k,v)
}
event.remove("theXML")
'
}
split {
field => "entry"
remove_field => "message"
}
}
Data in Kibana is good but I see _jsonparsefailure tag. Is there anyway I can understand what is failing ?
Jenni
July 30, 2020, 4:01pm
9
I think that comes from your input. You didn't set the codec
parameter, so it tried to use its default:
1 Like
rahulnama
(Rahul Nama)
July 30, 2020, 4:16pm
10
oh yea got it. Makes sense. @Jenni
Is there a way to specify xml ? I didnt see it in documentation.
Jenni
July 30, 2020, 4:45pm
11
I didn't see anything either. I think you can just use plain
for this as you already have an xml filter anyway.
1 Like
rahulnama
(Rahul Nama)
July 31, 2020, 4:30pm
13
rahulnama:
filter {
xml {
source => "message"
target => "theXML"
force_array => false
}
ruby {
code => '
event.get("theXML").each { |k, v|
event.set(k,v)
}
event.remove("theXML")
'
}
split {
field => "entry"
remove_field => "message"
}
}
Hi @Badger @Jenni
The above conf worked but message (field) is adding to every document (in es) with all the data. Any inputs to avoid this ?
Jenni
July 31, 2020, 4:56pm
14
(Edit: There was a wrong test and assumption that split doesn't call remove_field if there was only one entry. But Badger proved me wrong below. This is long and unnessecarry, so I am getting rid of it. Have a look at the edit history of this post, if you are interested in my idiotism )
If you move the remove_field
option to a separate mutate filter, it should work.
1 Like
Badger
July 31, 2020, 5:01pm
15
I would add the remove_field => [ "message" ] to the xml filter, so that it is only removed if it is successfully parsed.
@Jenni , the split filter will not decorate the event (i.e. filter_matched is not called) if the field is a string that does not contain the terminator .
1 Like
Jenni
July 31, 2020, 5:05pm
16
Ah. Sorry. Thanks. I had wrongfully assumed that add_field would keep my array as an array.
(But a feed with only one entry could still cause problems with split because it would be a hash instead of an array, wouldn't it?)
rahulnama
(Rahul Nama)
July 31, 2020, 6:59pm
17
Adding the remove_field as a separate filter worked as well. However, adding it in xml filter would make more sense.
Thank you both
Badger
July 31, 2020, 7:35pm
18
If the field were a hash you would get
logger.warn("Only String and Array types are splittable. field:#{@field} is of type = #{original_value.class}")
system
(system)
Closed
August 28, 2020, 7:35pm
19
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.