Tuning Attachment Ingest with arrays (get rid of the raw data!)

I'm using the pipeline in ingest an array of documents

PUT _ingest/pipeline/attachment
{
  "description": "Extract attachment information",
  "processors": [
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "field": "_ingest._value.data",
            "target_field": "_ingest._value.attachment",
            "properties": [ "content" ]
          }
        }
      }
    }
  ]
}    

As a sample

PUT jh_index/my_type/my_id?pipeline=attachment
{ 
"attachments": 
  [
    {
      "data": "_encoded document - large amount of base64 guff_"}, 
    {
      "data": "_another encoded document - even more base64 guff_"}
  ]
}

Works like a champ however what i end up with is...

{
  "_index": "jh_index",
  "_type": "my_type",
  "_id": "my_id",
  "_version": 15,
  "found": true,
  "_source": {
    "attachments": [
      {
        "data": "**_Large amount of base 64 guff I don't want_**",
        "attachment": {
          "content": "NEW HEADING1\nLorem ipsum dolor ......."
        }
      },
      {
        "data": "**_Another large amount of base 64 guff I don't want_**",
        "attachment": {
          "content": "HEADING1\nClick Insert and then choose the ........."
        }
      }
    ]
   }
}

I've tried using "processors" rather than "processor" so that I can add a

{
   "remove": {"field": "_ingest._value.data"}
}

but that seems to have been developed out on purpose Modify foreach processor to accept a single processor instead of collection #19345

I don't seem to be able to have 2 "foreach" one after the other in a pipeline

How do I remove the attachments.data field?

Thx

J/.

I did not try but I don't see why you would not be able to add a second foreach.

Did you try it?

If so, share what you tried.

here is an example of a pipeline that defines two foreach processors

POST _ingest/pipeline/_simulate
{
  "pipeline" : {
    "processors" : [
      {
        "foreach" : {
          "field": "field",
          "processor" : {
            "uppercase" : { "field" : "_ingest._value.data" }
          }
        }
      },
      {
        "foreach" : {
          "field": "field",
          "processor" : {
            "remove" : { "field" : "_ingest._value.data" }
          }
        }
      }
    ]
  },
  "docs" : [
    {
      "_source" : {
        "field": [{"data": "a"}, {"data": "b"}, {"data": "c"}]
      }
    }
  ]
}

seems to work in removing the field data

Kibana won't let you add more than one "foreach". I'm on cloud so it's the latest version.

I'm trying this

PUT _ingest/pipeline/attachment
{
  "description": "Extract attachment information",
  "processors": [
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "field": "_ingest._value.data",
            "target_field": "_ingest._value.attachment",
            "properties": [ "content" ]
          }
        }
      },
      "foreach": {
        "field": "attachments",
        "processor": {
          "field": "field",
          "processor" : {
            "remove" : { "field" : "_ingest._value.data" }
          }

          }
        }
      }
  ]
}

I was missing one closing curly bracket!. This code works.

Thank you so very much to Tal Levy

PUT _ingest/pipeline/attachment
{
  "description": "Extract attachment information",
  "processors": [
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "field": "_ingest._value.data",
            "target_field": "_ingest._value.attachment",
            "properties": [ "content" ]
          }
        }
      }
    },
  {
         "foreach": {
        "field": "attachments",
        "processor" : {
            "remove" : { "field" : "_ingest._value.data" }
          }
        }
     }  
  ]
}
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.