Ingest attachment plugin, 2 fields with content - one encoded and one decoded

Is it necessary to have content of pdf file in 2 fields in Elastic while using ingest attachment plugin?


"data": "JVBERi0xLjUNCiW1tbW1DQoxIDA

"attachment": {
"content_type": "application/pdf",
"author": "Sample",
"language": "en",
"title": "Title",
"content": """Some Content

Content of file is in data field (base64 encoded) and data field is used to fill content field (decoded string).
Can I have just one field content with decoded string?

No. It's better to remove the original field data IMO.

Does it have to be 2 step process?

  1. Set base64 encoded data field and index document with ?pipeline=attachment in URL. Language, author, content_length and content fields will be generated based on data field.
  2. Update document with data=null.

Yes. It needs 2 steps.

You can use the remove processor BTW:

I use attachment processor in foreach loop so I added second foreach loop with remove processor in processors section and this worked like a charm :smiley:

Thank you for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.