Hi,
I have a problem linked on the mapping I use :
PUT documents/myDoc/_mapping
{
"myDoc": {
"_source": {
"excludes": [
"file"
]
},
"properties": {
"title": {
"type": "string"
},
"file": {
"type": "attachment",
"fields": {
"content": {
"type": "string",
"term_vector": "with_positions_offsets",
"store": true
}
}
}
}
I'm doing this in order to not storing the file content and to be able to retrieve highlighted text from the extracted text.
I set the file._content
with the document content and then I can query on the extracted text with file.content
. It works fine.
However, I have an issue when I try to update an entry. If I do, for instance
POST documents/myDoc/123456789/_update
{
"doc": { "title": "my new title"}
}
After this update, I will have an empty value for file.content
. From what I have understood, this is the standard behavior as the file
is excluded from the _source
and because file.content
is also not in the _source
. On partial updates, Elasticsearch looses the properties which are not on the _source
.
I know I can re-upload the document on every update but it is not convenient at all. My question is : Is there a way to force the file.content
(the extracted text) to be part of the _source
, so I will not loose the extracted text on a partial update ?
Thanks in advance for your answer.