Hello everybody,
I hope my question is not too trivial: I have an ES index with a mapping similar to the following (only the relevant part is shown):
{
"mappings": {
"person": {
"_source": {
"excludes": [
"websites.body",
"websites.title",
"websites.description"
]
},
"properties": {
"name": {
"type": "string"
},
"websites": {
"properties": {
"url": {
"type": "string",
"index": "not_analyzed"
},
"body": {
"type": "string",
"store": true
},
"title": {
"type": "string",
"store": true
},
"description": {
"type": "string",
"store": true
}
}
}
}
}
}
}
Basically, a person may have multiple websites, for which we have some text fields (title, description, body) which are needed only for searching (no need to retrieve them in the _source
when querying) and are therefore marked as excluded
.
I wrote an Apache Spark application that reads the index, transforms the documents, and writes them back on the same index. I'm using elasticsearch-hadoop 2.1.0.Beta4
for this. Everything works as expected, with the only issue that the fields marked as excluded
in the mapping are not present anymore in the index after I run the job.
On the first place I thought the reason was the default write operation performed by ES, index
, which means (from the documentation): "new data is added while existing data (based on its id) is replaced (reindexed)."
I then tried setting es.write.operation=update
in my spark job, which (again from the documention) "updates existing data (based on its id). If no data is found, an exception is thrown". Before running my job I made sure the websites
field was not set on my documents, so that not being pushed on the index the old values should have been left untouched. Unfortunately this keeps on removing the excluded fields from my documents.
How can I update my index so that I can add and modify fields without altering the value of the excluded
fields?
Thanks in advance