Hi team,
(Elasticsearch 7.5)
My scenario is to add a new custom Analyzer to an existing Index. Analyzer definition may like below:
{
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"word_delimiter_graph",
"lowercase",
"test_ngram_filter_2_10"
]
}
}
}
The common solutions are either close/reopen index or reindex data to new index. But my concern is the downtime when index closed or the large volume data to be reindexed (and potential data inconsistency problem).
So I'm trying to probing whether have alternative options to add new analyzers.
If look into the way using AnalysisPlugin to add a new analyzer, it only requires all ES nodes restarted and then this new analyzer can be referenced in existing index, but no need to close the index or reindex.
So my understandings are:
- no matter restart ES node (for AnalysisPlugin way) or close/open index (for custom Analyzer in IndexSetting), the key action behind is to initialize the corresponding IndexService instances, which will populate all supported Analyzers from latest plugins and Index settings, and re-initialize underlying mapperService and lucene engine to use those Analyzers.
- when restarting nodes one-by-one to apply new AnalysisPlugin, it is possible that supported analyzers are not the same across all shards (restarted nodes will have new analyzers in plugin but old ones won't have). But it is ok if we don't use these new analyzers until the whole cluster is restarted.
Could you confirm whether above understandings are correct?
If yes, I'm thinking below 2 potential solutions:
-
we write a AnalysisPlugin to parse Analysis json payload from certain config and generate analyzers/tokenizers/etc. When need to add new custom analyzers, just need to update the config and restart ES cluster, no need to rebuild the AnalysisPlugin.
Since ES nodes are restarted one-by-one, no downtime.
Cons: 1) need wait for all ES nodes restarted to use the new analyzer 2) it is per-cluster not per-index analyzers. (need to avoid analyzer naming conflict in cluster) -
Write a plugin to leverage the pluginService.onIndexModule to add new analyzer definition to Index setting before IndexModule start to populate all Analyzers from Index setting.
Pros is it will leverage same parsing logic in ES to populate Analyzers from Analysis settings and it can somehow achieve per-index analyzer in that plugin logic.
Still it need 2 step rollout: first install this plugin to all nodes and then add new field with this new analyzers to Index mapper.
Does these options work or I still miss something?
Appreciate for any suggestions/comments!
Best Regards