Ingest Attachment processor pipeline for arrays, without storing base64 data

Hello,

How can I index multiple attachments within one document, as explained in https://www.elastic.co/guide/en/elasticsearch/plugins/7.5/ingest-attachment-with-arrays.html , but without storing the base64 data?

My ingest pipeline looks like this:
PUT _ingest/pipeline/attachment
{
"processors": [
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"field": "_ingest._value.data",
"ignore_failure": true,
"properties": [
"content"
],
"target_field": "_ingest._value.attachment"
}
}
}
}
]
}

I tried the solution given in
Ingest Attachment processor pipeline, but without storing base64 data , trying a few variants like the one below, but this doesn't seem to work when combined with the foreach construction.

PUT _ingest/pipeline/attachment
{
"processors": [
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"field": "_ingest._value.data",
"ignore_failure": true,
"properties": [
"content"
],
"target_field": "_ingest._value.attachment"
}
}
}
},
{
"remove": {
"field": "attachments.data"
}
}
]
}

In the above case I got "[data] is not an integer, cannot be used as an index as part of path" which doesn't make much sense to me.

Is there a different ingest pipeline config I could use to index an array of attachments without their base64 contents?

Welcome!

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

I tried a simple remove example (without using the attachment plugin) and was not able to reproduce the behavior you mentioned:

POST _ingest/pipeline/_simulate
{
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "attachments": { "data": ["foo", "bar"] }
      }
    }],
  "pipeline": {
    "processors": [
      {
        "remove": {
          "field": "attachments.data"
        }
      }
    ]
  }
}

Could you provide a similar API call which helps to diagnose your problem?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.