Ingest attachment plugin, 2 fields with content - one encoded and one decoded


(Marcin Bednarek) #1

Is it necessary to have content of pdf file in 2 fields in Elastic while using ingest attachment plugin?

Example:

"data": "JVBERi0xLjUNCiW1tbW1DQoxIDA

"attachment": {
"content_type": "application/pdf",
"author": "Sample",
"language": "en",
"title": "Title",
"content": """Some Content

Content of file is in data field (base64 encoded) and data field is used to fill content field (decoded string).
Can I have just one field content with decoded string?


(David Pilato) #2

No. It's better to remove the original field data IMO.


(Marcin Bednarek) #3

Does it have to be 2 step process?

  1. Set base64 encoded data field and index document with ?pipeline=attachment in URL. Language, author, content_length and content fields will be generated based on data field.
  2. Update document with data=null.

(David Pilato) #4

Yes. It needs 2 steps.

You can use the remove processor BTW: https://www.elastic.co/guide/en/elasticsearch/reference/current/remove-processor.html


(Marcin Bednarek) #5

I use attachment processor in foreach loop so I added second foreach loop with remove processor in processors section and this worked like a charm :smiley:

Thank you for your help!


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.