Elasticsearch-mapper-attachments : Loosing file.content on _update

Hi,

I have a problem linked on the mapping I use :

PUT documents/myDoc/_mapping
{
    "myDoc": {
        "_source": {
            "excludes": [
                "file"
            ]
        },
        "properties": {
           "title": {
                  "type": "string"
           },
            "file": {
                "type": "attachment",
                "fields": {
                    "content": {
                        "type": "string",
                        "term_vector": "with_positions_offsets",
                        "store": true
                    }
            }
       }
}

I'm doing this in order to not storing the file content and to be able to retrieve highlighted text from the extracted text.
I set the file._content with the document content and then I can query on the extracted text with file.content. It works fine.

However, I have an issue when I try to update an entry. If I do, for instance

POST documents/myDoc/123456789/_update
{
  "doc": { "title": "my new title"}
}

After this update, I will have an empty value for file.content. From what I have understood, this is the standard behavior as the file is excluded from the _source and because file.content is also not in the _source. On partial updates, Elasticsearch looses the properties which are not on the _source.

I know I can re-upload the document on every update but it is not convenient at all. My question is : Is there a way to force the file.content (the extracted text) to be part of the _source, so I will not loose the extracted text on a partial update ?

Thanks in advance for your answer.

Not unless you store it.

Why do you mean by that ?
The file.contentis already stored. It is not part of the source though. Only thefile._contentis.

Is there something else I can do ?

It's because when you update a record it needs _source to exist, otherwise it sees there is nothing and assumes that whatever you are passing in is _source.

See https://www.elastic.co/guide/en/elasticsearch/reference/2.3/docs-update.html#docs-update;

Note, this operation still means full reindex of the document, it just removes some network roundtrips and reduces chances of version conflicts between the get and the index. The _source field need to be enabled for this feature to work.

I understand that.

This is why I'm trying to add the file.content to the _source. I guess this is the only way avoiding storing the complete file in ES while being able to perform _update.

Any ideas on how to do that ?

As I wrote here https://github.com/elastic/elasticsearch-mapper-attachments/issues/209#issuecomment-207363491

And for now I don't believe the update API could work with attachments.