Change mapping in pipeline for ingest-attachment plugin

I have a workflow for storing a range of binary document types (some handled by Tika, others that need pre-processing, with json generated in a script and indexed using PUT). For the documents handled by the ingest-attachment plugin, I want to 'flatten' the json for the document. Instead of having the "attachment" field with nested keys, I want it to be non-nested. So instead of something like this in the index results:

{
    "attachment": {
        "date": "2017-03-28T05:12:37Z",
        "content_type": "application/pdf",
        "language": "en",
        "content": "Text from PDF document"
    }
}

I would want something like this (with some of the field names changed):

{
    "ModDate": "2017-03-28T05:12:37Z",
    "content_type": "application/pdf",
    "language": "en",
    "content": "Text from PDF document"
}

How would I achieve this? Is there anything stupid about this idea that I've not realised?
Can I change the mapping in the plugin's pipeline definition (I don't even know if this is the right terminology) or do I need to re index once indexed?

The requirement comes about because most of the documents that I'm indexing need pre-processing in closed-source software. I don't send the binary document to ES, rather just a json document with a set of common fields (the file metadata mostly), and any number of auto-generated fields.

I believe that after the attachment processor you need to add the rename processor: https://www.elastic.co/guide/en/elasticsearch/reference/6.5/rename-processor.html

1 Like

Perfect, thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.