ES 2.0 and attachments plugin issue


It used to be that this would work:

"file" : {
    "properties" : {
        "attachment" : {  
            "type" : "attachment",
            "path" : "full",
            "fields": {
                "attachment": {
                    "type": "string",
                    "term_vector": "with_positions_offsets",
                    "store": true

But since upgrading to ES 2.0 from 1.5.2, when trying to create the index, I get the following error:
MapperParsingException[Mapping definition for [fields] has unsupported parameters: [attachment : {type=string}]];

The current README for the plugin shows under 'fields' instead of 'attachment' (or whatever you name your attachment) you need to use 'content'.

You also have to change everywhere you refer to the attachment.

  1. What caused this change? Was this intentional?

  2. Before this change, it seemed that the entire attachments was stored and accessible via 'attachment' as the only field name. What was going on? Was the content being mapped to the name 'attachment' so you wouldn't need to reference it by 'attachment.content'?

(David Pilato) #2

Yes. It changed.
Read this commit:


OK, thanks.

  1. So before the change it was possible to exclude the attachment ('attachment' in the example above) from _source (to avoid storing the binary encoded content), but then later access it by adding it to 'fields' in a search (i.e. fields: ['attachment']). Now, should 'attachment.content' be added to 'excludes' and 'attachment.content' added to 'fields'?

  2. Can the whole 'attachment' instead be stored along with its sub fields, and then included in 'excludes' and 'fields'?

(David Pilato) #4

I think that you can still exclude the attachment from the _source field.
But you won't be able to get it back then.

But read the _source documentation. It's not recommended.

If you are coding in Java, I'd recommend not using this plugin but Tika in your application. This is what I did in fscrawler project.

(system) #5