I am currently looking to optimize the performance of indexing and search on fields with very large text in them.
And I have already read the following articles and previous forum posts.
- Searching in a large text field with Elasticsearch stored fields
- Field data is too large
- What does it mean to “store” a field?
Based on this information I measured performance with the index settings shown below.
{
"mappings": {
"_source": {
"excludes": [
"content"
]
},
"properties": {
"content": {
"type": "text",
"store": true
},
"author": {
"type": "keyword"
},
"title": {
"type": "keyword"
}
}
}
}
Happily, the "store: true" setting allows the indexing performance, which was slow when updating documents, is now far better.
We have also confirmed that this setup can handle search api highlighting.
However, the trouble is that the "store: true" field is not migrated in the reindex api with this index setting.
So I looked into the specs of the reindex api.
There I found that it says that fields with "_source" disabled are not supported.
Reindex requires
_source
to be enabled for all documents in the source.
I certainly understand that I am disabling "_source" and enabling "store" to optimize indexing and search performance for documents with very large text fields.
I understand that this is why the reindex api does not migrate the content field, as per the spec.
However, if we must reindex in such a case, what should we do?
Please let me know if there is a way to migrate including stored_field.
We are currently considering using Update mapping API as an alternative to re-indexing work.
We also know that the same API can be used to change the settings of the analyzer.
What I was trying to do by reindex api is to change or add mapping parameters, add or change analyzer, etc. which are listed below.
Except for supported mapping parameters, you can’t change the mapping or field type of an existing field. Changing an existing field could invalidate data that’s already indexed.
Am I correct in assuming that these settings can be changed without using the reindex api?
Also, do I still need to use the reindex api to delete existing fields?
Also, I know that if I have to reindex, I can solve the following problems by inserting the storied field values individually after reindexing with a bulk api, etc. If there is a better way, please let me know.
However, if we must reindex in such a case, what should we do?
Please let me know if there is a way to migrate including stored_field.
Sorry for so many questions.
Please give me your best regards.