Some thoughts on adding new Analyzer to existing Index without reindex or close

garyzjq · September 27, 2021, 2:42pm

Hi team,

(Elasticsearch 7.5)

My scenario is to add a new custom Analyzer to an existing Index. Analyzer definition may like below:

{
    "analyzer": {
        "ngram_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [
                "word_delimiter_graph",
                "lowercase",
                "test_ngram_filter_2_10"
            ]
        }
    }
}

The common solutions are either close/reopen index or reindex data to new index. But my concern is the downtime when index closed or the large volume data to be reindexed (and potential data inconsistency problem).
So I'm trying to probing whether have alternative options to add new analyzers.

If look into the way using AnalysisPlugin to add a new analyzer, it only requires all ES nodes restarted and then this new analyzer can be referenced in existing index, but no need to close the index or reindex.
So my understandings are:

no matter restart ES node (for AnalysisPlugin way) or close/open index (for custom Analyzer in IndexSetting), the key action behind is to initialize the corresponding IndexService instances, which will populate all supported Analyzers from latest plugins and Index settings, and re-initialize underlying mapperService and lucene engine to use those Analyzers.
when restarting nodes one-by-one to apply new AnalysisPlugin, it is possible that supported analyzers are not the same across all shards (restarted nodes will have new analyzers in plugin but old ones won't have). But it is ok if we don't use these new analyzers until the whole cluster is restarted.

Could you confirm whether above understandings are correct?

If yes, I'm thinking below 2 potential solutions:

we write a AnalysisPlugin to parse Analysis json payload from certain config and generate analyzers/tokenizers/etc. When need to add new custom analyzers, just need to update the config and restart ES cluster, no need to rebuild the AnalysisPlugin.
Since ES nodes are restarted one-by-one, no downtime.
Cons: 1) need wait for all ES nodes restarted to use the new analyzer 2) it is per-cluster not per-index analyzers. (need to avoid analyzer naming conflict in cluster)
Write a plugin to leverage the pluginService.onIndexModule to add new analyzer definition to Index setting before IndexModule start to populate all Analyzers from Index setting.
Pros is it will leverage same parsing logic in ES to populate Analyzers from Analysis settings and it can somehow achieve per-index analyzer in that plugin logic.
Still it need 2 step rollout: first install this plugin to all nodes and then add new field with this new analyzers to Index mapper.

Does these options work or I still miss something?
Appreciate for any suggestions/comments!

Best Regards

warkolm · September 27, 2021, 11:59pm

Welcome to our community!

This is EOL, please upgrade!

It seems like you're really trying to reinvent the wheel here. Why would you go to all the trouble of writing and maintaining plugins to do this? How large is your dataset?

garyzjq · September 28, 2021, 5:04am

thanks for your reply, Mark!

Our data is TB level.
Currently I'm still probing all potential solutions and not directly say no to reindex approach or close/open. it is always possible in code world, just some pros/cons compare

What trigger me to probe more on above solutions is because the AnalysisPlugin way to add new analyzer doesn't require index close/open or reindex, but update Index setting way need. So I'm curious why these 2 user experience are different, or whether there's any technical blocker for ES to support adding Analyzer in Index setting without close the whole Index at same time.

By reading the src code and do some quick try, my understanding is technically it is not a must-have to close the full Index (restarting shard (actually the node) one-by-one can also take effect). Thus I want to consult your and community's expertise to check whether this approach is functionally correct.(let's put user experience or plugin effort aside first)

(BTW, do you know any future plan or blocker to support add new analyzer to existing Index in real-time (e.g. through Index Setting Update API)? It looks like an convenient feature for index management but doesn't find a lot discussion in community.)

warkolm · September 28, 2021, 5:44am

I don't know enough about that level sorry, hopefully someone else can chime in there. I wanted to better understand the broader context of what you are trying to solve for.

garyzjq · September 28, 2021, 4:19pm

still thanks a lot for replying, Mark. What I'm looking for is to make it as lightweight as possible to add new analyzers to existing running indices, to give us more flexibility when enabling more and more full text match scenario.
if possible, do you know who is familiar with analysis part so that I can further consult from? super interesting on underlying ES mechanism.

garyzjq · October 10, 2021, 12:19pm

Ping to keep this thread active. Any suggestions/comments are appreciated !

system · November 7, 2021, 12:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ask for best practice how to add new plugin analyzer to existing index (new field) Elasticsearch	3	378	January 14, 2022
Adding custom analyzer after index creation Elasticsearch	3	1157	July 6, 2017
Custom Analyzers in 5.X After Node based in 2.X Elasticsearch	2	517	October 26, 2017
Adding analysers to an ES Index without downtime Elasticsearch	2	417	October 27, 2020
Change/specify custom analyzer to existing index or how to reindex with custom analyzer Elasticsearch	3	54	October 28, 2024

Some thoughts on adding new Analyzer to existing Index without reindex or close

Related topics