Hello!
So I am diving into v5 at the moment and want to use the Ingest Attachment Processor Plugin (https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment.html) in combination with the Array "datatype" (https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html)
A document would look somewhat like this:
{
"_index": "visual-draft-3",
"_type": "document",
"_id": "441FDE6CFFF3D11EC12570F10053DE49",
"_score": 1,
"_source":
{
"EingangMuster": null,
"Produkt_13": null,
"Produkt_11": null,
"attachments": [
{
"filename": "somedoc.pdf",
"data": "base64DataString"
},
{
"filename": "somedoc.docx",
"data": "base64DataString"
}
]
}
}
Ingest Pipeline:
esClient.ingest.putPipeline({
id: "attachment_pipe",
body: {
"description": "Process document attachments",
"processors": [{
"attachment": {
"field": "data",
"indexed_chars": -1
}
}]
}
}, function(error, response) {
console.log(error, response);
});
This does not work out of the box for me, since I now have an array of attachments in one document which itself has the data field for the pipeline. The error output:
org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field [data]]; nested: IllegalArgumentException[field [data] not present as part of path [data]];
at org.elasticsearch.ingest.CompoundProcessor.newCompoundProcessorException(CompoundProcessor.java:156) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:107) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:166) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:41) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.ingest.PipelineExecutionService$1.doRun(PipelineExecutionService.java:65) [elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:520) [elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.0.1.jar:5.0.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_92]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92]
Caused by: java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field [data]]; nested: IllegalArgumentException[field [data] not present as part of path [data]];
... 11 more
Caused by: org.elasticsearch.ElasticsearchParseException: Error parsing document in field [data]
at org.elasticsearch.ingest.attachment.AttachmentProcessor.execute(AttachmentProcessor.java:131) ~[?:?]
at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100) ~[elasticsearch-5.0.1.jar:5.0.1]
... 9 more
Caused by: java.lang.IllegalArgumentException: field [data] not present as part of path [data]
at org.elasticsearch.ingest.IngestDocument.resolve(IngestDocument.java:308) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.ingest.IngestDocument.getFieldValue(IngestDocument.java:114) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.ingest.IngestDocument.getFieldValueAsBytes(IngestDocument.java:141) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.ingest.attachment.AttachmentProcessor.execute(AttachmentProcessor.java:71) ~[?:?]
at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100) ~[elasticsearch-5.0.1.jar:5.0.1]
... 9 more
How can the ingest plugin access the nested Array data?
Possible alternatives :
- Create a type for attachments and use the plugin there (then I would have to combine query results)
- Add n data fields to each document and add the attachment data there (don't want to do this, many null fields, and cluttered structure, sometimes n>50!)
I welcome any ideas, solutions and creative input!