Im stuck at formatting data while trying to send xml file to Elasticsearch

Sourtoast · July 6, 2021, 10:25pm

I'm trying to add data from xml to Elasticsearch but I want only specific fields with its names changed.
I'm using mutate filter to map xml fields to my own fields. My problem is with photos

input {
	http_poller {
		urls => {
			test => "https://super-secret-url.com"
		}
		schedule => { cron => "* * * * * UTC" }
		codec => "plain"
	}
}

filter {
	# Transform XML file to datastructure with field named 'source_file_products' containing array of XML strings for every product
	xml {
		source => "message"
		force_array => false
		xpath => [ "/dane/produkty/p", "source_file_products" ]
		store_xml => false
		remove_field => "message"
	}
	# Split array of XML strings as different events
	split { field => "source_file_products" }
	# Parse XML to datastructure
	xml {
		source => "source_file_products"
		force_array => false
		target => "source_file_product"
		remove_field => ["source_file_products", "@timestamp", "@version"]
	}
	# Get only relevant fields and map them respectively
	mutate {
		add_field => { 
			"_id" => "%{[source_file_product][id]}"
			"name" => "%{[source_file_product][nazwa]}"
			"price" => "%{[source_file_product][cena]}"
			"shortDescription" => "%{[source_file_product][html_description]}"
			"ean" => "%{[source_file_product][kod_ean]}"
			"mpn" => "%{[source_file_product][kod_producenta]}"
			"photos" => "%{[source_file_product][photos][0][url]}"
		}
		remove_field => "source_file_product"
	}
}
output {
	file {
		path => "./test.ndjson"
		codec => "json_lines"
	}
}

Data structure of [source_file_product][photos] looks like this:

"photos" => [
	[0] {
		"foo" => "bar",
		"attr1" => "1",
		"url" => "First link to a photo"
	},
	[1] {
		"url" => "Second link to a photo"
	},
	[2] {
		"more" => "clutter"
		"url" => "Third link to a photo"
	},
	[3] {
		"url" => "Fourth link to a photo"
	}
]

But I need:

"photos" => [
	"First link to a photo",
	"Second link to a photo",
	"Third link to a photo",
	"Fourth link to a photo"
]

The issue is that I don't know how many of photos will be in this array

Badger · July 6, 2021, 11:23pm

You can do that in ruby. I have not tested it but something like

ruby {
    code => '
        p = event.get("[source_file_product][photos]")
        if p.is_a? Array
            newP = []
            p.each_index { |x|
                newP << x["url"]
            }
            event.set("[source_file_product][photos]", newP)
    '
}

system · August 3, 2021, 11:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash - Parse XML attributes Logstash	4	643	September 16, 2018
XML Filter: How import data from XML file Logstash	2	848	August 26, 2021
Upload xml file into elasticsearch Elasticsearch	11	4298	January 31, 2019
XML XPath filter is parsing fields but not inserting in Elasticsearch Logstash	9	1927	April 11, 2018
XML on Elasticsearch Logstash	10	764	August 29, 2018

Im stuck at formatting data while trying to send xml file to Elasticsearch

Related topics