How to control the "_indexed_chars" value on a Ingest Attachment pipeline?

Hi,

I created a pipeline to ingest office/pdf files using the Ingest Attachment pipeline without defining a value for "indexed_chars" (so I guess that the default value of 100k chars is used).

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}

Some of my users want to be able to use a per document value, as described here for the Mapper plugin.

Is it possible to the the same with the Ingest Attachment plugin?

See https://www.elastic.co/guide/en/elasticsearch/plugins/6.2/using-ingest-attachment.html

Thank you David,
I read this doc already, but it doesn't answer the question about the "per document" aspect.

If the "indexed_chars" can only be set at the pipeline definition level, I would have to set it to "-1" to ensure that all my users will be able to index anything...but it's a risk to crash the ingesting node if somebody sends an extremely big file.

That's why I wanted to know if "this" was doable with Ingest Attachment plugin or not.

Thanks in advance!

I see.

That's indeed not doable AFAIK. May be something we can support as an option like reading this limit value from the document itself by adding a setting like field_indexed_chars.

Then we could do something:

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "field_indexed_chars" : "size"
      }
    }
  ]
}

Then index either:

PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}

Which will use the default value (or the one defined by indexed_chars)

Or

PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000 
}

Would you like to open a feature request for it?

1 Like

If it's possible, yes, I'd like a feature request!

Many thanks!

I opened

Let's see how it goes.

1 Like

Thank you David!

FYI I merged this today:

Should be available in 6.3.0 and later. :boom:

1 Like

Great! Thanks a lot David! :clap::clap:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.