How to control the "_indexed_chars" value on a Ingest Attachment pipeline?

I see.

That's indeed not doable AFAIK. May be something we can support as an option like reading this limit value from the document itself by adding a setting like field_indexed_chars.

Then we could do something:

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "field_indexed_chars" : "size"
      }
    }
  ]
}

Then index either:

PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}

Which will use the default value (or the one defined by indexed_chars)

Or

PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000 
}

Would you like to open a feature request for it?

1 Like