I'm trying to add data from xml to Elasticsearch but I want only specific fields with its names changed.
I'm using mutate filter to map xml fields to my own fields. My problem is with photos
input {
http_poller {
urls => {
test => "https://super-secret-url.com"
}
schedule => { cron => "* * * * * UTC" }
codec => "plain"
}
}
filter {
# Transform XML file to datastructure with field named 'source_file_products' containing array of XML strings for every product
xml {
source => "message"
force_array => false
xpath => [ "/dane/produkty/p", "source_file_products" ]
store_xml => false
remove_field => "message"
}
# Split array of XML strings as different events
split { field => "source_file_products" }
# Parse XML to datastructure
xml {
source => "source_file_products"
force_array => false
target => "source_file_product"
remove_field => ["source_file_products", "@timestamp", "@version"]
}
# Get only relevant fields and map them respectively
mutate {
add_field => {
"_id" => "%{[source_file_product][id]}"
"name" => "%{[source_file_product][nazwa]}"
"price" => "%{[source_file_product][cena]}"
"shortDescription" => "%{[source_file_product][html_description]}"
"ean" => "%{[source_file_product][kod_ean]}"
"mpn" => "%{[source_file_product][kod_producenta]}"
"photos" => "%{[source_file_product][photos][0][url]}"
}
remove_field => "source_file_product"
}
}
output {
file {
path => "./test.ndjson"
codec => "json_lines"
}
}
Data structure of [source_file_product][photos] looks like this:
"photos" => [
[0] {
"foo" => "bar",
"attr1" => "1",
"url" => "First link to a photo"
},
[1] {
"url" => "Second link to a photo"
},
[2] {
"more" => "clutter"
"url" => "Third link to a photo"
},
[3] {
"url" => "Fourth link to a photo"
}
]
But I need:
"photos" => [
"First link to a photo",
"Second link to a photo",
"Third link to a photo",
"Fourth link to a photo"
]
The issue is that I don't know how many of photos will be in this array