Hi,
I have a problem linked on the mapping I use :
PUT documents/myDoc/_mapping
{
"myDoc": {
"_source": {
"excludes": [
"file"
]
},
"properties": {
"title": {
"type": "string"
},
"file": {
"type": "attachment",
"fields": {
"content": {
"type": "string",
"term_vector": "with_positions_offsets",
"store": true
}
}
}
}
I'm doing this in order to not storing the file content and to be able to retrieve highlighted text from the extracted text.
I set the file._content with the document content and then I can query on the extracted text with file.content. It works fine.
However, I have an issue when I try to update an entry. If I do, for instance
POST documents/myDoc/123456789/_update
{
"doc": { "title": "my new title"}
}
After this update, I will have an empty value for file.content. From what I have understood, this is the standard behavior as the file is excluded from the _source and because file.content is also not in the _source. On partial updates, Elasticsearch looses the properties which are not on the _source.
I know I can re-upload the document on every update but it is not convenient at all. My question is : Is there a way to force the file.content (the extracted text) to be part of the _source, so I will not loose the extracted text on a partial update ?
Thanks in advance for your answer.