Hi all I'm using Elasticsearch 7.8 and I'm having an weird situation while using bulk operation to index some documents, that might have or not attachment the them, like if run the following command:
curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index":{"_id":"1","_index":"t1"}}
{"active_user":false,"content":"Something I wrote", "document_id":"1","topic":"Test1"}
{"index":{"_id":"2","_index":"t1'"}}
{"active_user":false,"content":"Something I wrote", "document_id":"2","topic":"Test2"}
{"index":{"_id":"3","_index":"t1'","pipeline":"attachment"}}
{"data":"<BASE64ENCODEDPDFFILE>", "document_id":"3","topic":"Test PDF"}
`
I get this error on ES log:
{"type": "server", "timestamp": "2020-08-12T19:00:50,149Z", "level": "DEBUG", "component": "o.e.a.b.T.BulkRequestModifier", "cluster.name": "docker-cluster", "node.name": "4e71dc30d5e2", "message": "failed to execute pipeline [_none] for document [t1/_doc/1]", "cluster.uuid": "3YRzz0W_RvGuSiJrIGC3GQ", "node.id": "qlGXQX8uRXum5ZGd9rdPcQ" ,
"stacktrace": ["org.elasticsearch.ingest.IngestProcessorException: ElasticsearchParseException[Error parsing document in field [data]]; nested:
TikaException[TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@565cd0c5]; nested: IOException[Page tree root must be a dictionary];"
Except for the first document the others 2 are indexed successfully, and it seems like it's waiting a "data" field on the first document, but if I remove the first document the second one raised the same error.
And if I index the first document separated alone on another bulk operation it gets indexed normally.
So it seems to be a problem mixing attachments with other documents.
Can someone help me to understand what I'm doing wrong?