I see.
That's indeed not doable AFAIK. May be something we can support as an option like reading this limit value from the document itself by adding a setting like field_indexed_chars.
Then we could do something:
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information. Used to parse pdf and office files",
"processors" : [
{
"attachment" : {
"field" : "data",
"field_indexed_chars" : "size"
}
}
]
}
Then index either:
PUT index/doc/1?pipeline=attachment
{
"data": "BASE64"
}
Which will use the default value (or the one defined by indexed_chars)
Or
PUT index/doc/2?pipeline=attachment
{
"data": "BASE64",
"size": 1000
}
Would you like to open a feature request for it?