I have a workflow for storing a range of binary document types (some handled by Tika, others that need pre-processing, with json generated in a script and indexed using PUT). For the documents handled by the ingest-attachment plugin, I want to 'flatten' the json for the document. Instead of having the "attachment" field with nested keys, I want it to be non-nested. So instead of something like this in the index results:
{
"attachment": {
"date": "2017-03-28T05:12:37Z",
"content_type": "application/pdf",
"language": "en",
"content": "Text from PDF document"
}
}
I would want something like this (with some of the field names changed):
{
"ModDate": "2017-03-28T05:12:37Z",
"content_type": "application/pdf",
"language": "en",
"content": "Text from PDF document"
}
How would I achieve this? Is there anything stupid about this idea that I've not realised?
Can I change the mapping in the plugin's pipeline definition (I don't even know if this is the right terminology) or do I need to re index once indexed?
The requirement comes about because most of the documents that I'm indexing need pre-processing in closed-source software. I don't send the binary document to ES, rather just a json document with a set of common fields (the file metadata mostly), and any number of auto-generated fields.