Inconsistent AWS Opensearch ingest attachment processor behaviour

We are using AWS OpenSearch for one of our application. We have configured ingest attachment processor for extracting text from .docx files.

Here is our environment setup details,

Note: For DEV/QA, we use same instance with different index name.

Details DEV/QA UAT
Version 1.1 1.1
Service Software Version R20220223-P6 R20220928-P1
Dedicated Master Node No Yes
No of nodes 2 3
Instance Type m5.large.search m6g.xlarge.search

Flow: Base64 of the .docx file is sent to the ingest attachment processor during indexing time, the text is extracted and saved into a field on opensearch.

Issue ( Inconsistent behavior): Specific .docx file is not parsable by OpenSearch Attachment Processor and throwing the below error.

[logstash.outputs.amazonelasticsearch][main][] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"25220", :_index=>"test_index", :_type=>"_doc", :_routing=>nil, :pipeline=>"attachment_dev"}, #<LogStash::Event:0x904adf8>], :response=>{"index"=>{"_index"=>"test_index", "_type"=>"_doc", "_id"=>"25220", "status"=>400, "error"=>{"type"=>"parse_exception", "reason"=>"Error parsing document in field [form_data]", "caused_by"=>{"type"=>"tika_exception", "reason"=>"TIKA-198: Illegal IOException from org.apache.tika.parser.ParserDecorator$2@298392f1", "caused_by"=>{"type"=>"i_o_exception", "reason"=>"No such file or directory"}}}}}.

I use "on_failure" configuration to handle this. But the behavior is inconsistent. When I try to simulate(_ingest/pipeline/<>/_simulate) the text extraction by providing the base64 in request body,

On DEV/QA - If I hit 5 times continuously, 4 time text extraction is happening properly and rest 1 time text is not getting extracted and on_failure method is executed.

On UAT - If I hit 5 times continuously, all the five times text is not getting extracted and on_failure method is executed.

Ingest attachment configuration and plugin version ( 1.1.0 ) is same between both the environments. Kindly help in resolving this issue.

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

Opensearch is not supported here, it is a fork of an Old version of Elasticsearch and it has some changed made by AWS.

You need to check this on a Opensearch forum.

1 Like

But the plugin we used for attachment processing is the below

Yeah, but unless you can replicate the same issue using the last version of Elasticsearch it is not possible to know if the cause is the plugin or something changed by AWS on the Opensearch fork.

If this is a bug in the plugin Elastic will only address it if you can replicate in the last version of Elasticsearch.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.