Some thoughts on adding new Analyzer to existing Index without reindex or close

Hi team,

(Elasticsearch 7.5)

My scenario is to add a new custom Analyzer to an existing Index. Analyzer definition may like below:

{
    "analyzer": {
        "ngram_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [
                "word_delimiter_graph",
                "lowercase",
                "test_ngram_filter_2_10"
            ]
        }
    }
}

The common solutions are either close/reopen index or reindex data to new index. But my concern is the downtime when index closed or the large volume data to be reindexed (and potential data inconsistency problem).
So I'm trying to probing whether have alternative options to add new analyzers.

If look into the way using AnalysisPlugin to add a new analyzer, it only requires all ES nodes restarted and then this new analyzer can be referenced in existing index, but no need to close the index or reindex.
So my understandings are:

  1. no matter restart ES node (for AnalysisPlugin way) or close/open index (for custom Analyzer in IndexSetting), the key action behind is to initialize the corresponding IndexService instances, which will populate all supported Analyzers from latest plugins and Index settings, and re-initialize underlying mapperService and lucene engine to use those Analyzers.
  2. when restarting nodes one-by-one to apply new AnalysisPlugin, it is possible that supported analyzers are not the same across all shards (restarted nodes will have new analyzers in plugin but old ones won't have). But it is ok if we don't use these new analyzers until the whole cluster is restarted.

Could you confirm whether above understandings are correct?

If yes, I'm thinking below 2 potential solutions:

  1. we write a AnalysisPlugin to parse Analysis json payload from certain config and generate analyzers/tokenizers/etc. When need to add new custom analyzers, just need to update the config and restart ES cluster, no need to rebuild the AnalysisPlugin.
    Since ES nodes are restarted one-by-one, no downtime.
    Cons: 1) need wait for all ES nodes restarted to use the new analyzer 2) it is per-cluster not per-index analyzers. (need to avoid analyzer naming conflict in cluster)

  2. Write a plugin to leverage the pluginService.onIndexModule to add new analyzer definition to Index setting before IndexModule start to populate all Analyzers from Index setting.
    Pros is it will leverage same parsing logic in ES to populate Analyzers from Analysis settings and it can somehow achieve per-index analyzer in that plugin logic.
    Still it need 2 step rollout: first install this plugin to all nodes and then add new field with this new analyzers to Index mapper.

Does these options work or I still miss something?
Appreciate for any suggestions/comments!

Best Regards

Welcome to our community! :smiley:

This is EOL, please upgrade!

It seems like you're really trying to reinvent the wheel here. Why would you go to all the trouble of writing and maintaining plugins to do this? How large is your dataset?

thanks for your reply, Mark!

Our data is TB level.
Currently I'm still probing all potential solutions and not directly say no to reindex approach or close/open. it is always possible in code world, just some pros/cons compare :slight_smile:

What trigger me to probe more on above solutions is because the AnalysisPlugin way to add new analyzer doesn't require index close/open or reindex, but update Index setting way need. So I'm curious why these 2 user experience are different, or whether there's any technical blocker for ES to support adding Analyzer in Index setting without close the whole Index at same time.

By reading the src code and do some quick try, my understanding is technically it is not a must-have to close the full Index (restarting shard (actually the node) one-by-one can also take effect). Thus I want to consult your and community's expertise to check whether this approach is functionally correct.(let's put user experience or plugin effort aside first)

(BTW, do you know any future plan or blocker to support add new analyzer to existing Index in real-time (e.g. through Index Setting Update API)? It looks like an convenient feature for index management but doesn't find a lot discussion in community.)

1 Like

I don't know enough about that level sorry, hopefully someone else can chime in there. I wanted to better understand the broader context of what you are trying to solve for.

still thanks a lot for replying, Mark. What I'm looking for is to make it as lightweight as possible to add new analyzers to existing running indices, to give us more flexibility when enabling more and more full text match scenario.
if possible, do you know who is familiar with analysis part so that I can further consult from? super interesting on underlying ES mechanism.

Ping to keep this thread active. Any suggestions/comments are appreciated !