Is there a way to process an attachment conditionally for a nested object that's represented by an array?
I looked into the foreach processor with a conditional if
, but the ctx
object does not appear to hold a reference to the current array position, so I cannot find a way to write the if
in a way that's valid.
I've also been playing around with the script
processor to see if I could find to loop through the property and conditional process each item, but I can't find a way to access the plugin logic from the painless script.
Here's a simplified version of the index I'm using:
{
"mappings": {
"properties": {
"title": { "type": "text" },
"keywords": { "type": "keyword" },
"article": {
"type": "nested",
"properties": {
"id": {
"type": "text"
},
"content": {
"type": "text"
},
"type": {
"type": "text"
}
}
}
}
},
"settings": {
"number_of_shards": 1,
"number_of_replicas": 2
}
}
An article document might look like this:
{
"title"="My article"
, "keywords"=[]
, "article": [
{type: "text", "id": "basic", content: "Some text"}
, {type: "attachment", "id": "file-id-1", content: "base64 encoded data..."}
, {type: "attachment", "id": "file-id-2", content: "base64 encoded data..."}
]
}
My goal is to process the article
nested object, but only process the array items that have a type
equal to attachment
.
My first attempt, was to add a conditional statement to my pipeline:
{
"description" : "Extract attachment information from arrays",
"processors" : [
{
"foreach": {
"field": "article",
"processor": {
"attachment": {
"if": "ctx.article.type == \"attachment\""
, "target_field": "_ingest._value.content"
, "field": "_ingest._value.content"
}
}
}
}
]
}
However, the ctx
variable holds the root document object and I cannot figure out a way to access the current index in the article collection.
Next, I tried implementing a script, but I cannot figure out how to programmatically trigger the attachment
process. The follow code was the basic code I was working off and it does successfully remove the content
key for attachments, but what I want to do is replace the base64 data with the extract text.
{
"description" : "Extract attachment information from arrays"
, "processors" : [
{
"foreach": {
"field": "article",
"processor": {
"script": {
"lang": "painless"
, "source": #serializeJSON('
ctx.article.stream()
.filter(x -> x.type == "attachment")
.forEach(x -> x.remove("content"))
;
')#
}
}
}
}
]
}
Is there a way to do what I want?