Reloading dictionary decompounder word_list_path

I've defined my own dictionary_decompounder word list to cater for special words that are not matched by the German hyphenation_decompounder:

{
    "version": 3,
    "_meta": {
        "description": "Default index settings with German compound splitter analyzer (trennt zusammengesetzte Substantive)"
    },
    "priority": 500,
    "index_patterns": [
        "elasticsearch_index_drupal*_content_index"
    ],

    "template": {
        "settings": {
            "number_of_shards": 1,

            "analysis": {
                "filter": {
                    "german_filter": {
                        "type": "hyphenation_decompounder",
                        "word_list_path": "analyzer/dictionary-de.txt",
                        "hyphenation_patterns_path": "analyzer/de_DR.xml",
                        "only_longest_match": true,
                        "min_subword_size": 4
                    },
                    "german_filter_custom": {
                        "type": "dictionary_decompounder",
                        "word_list": ["bot", "chat", "gpt"]
                    },
     [...]

Modifying this word list in the template requires me to update the index template, and then I have to drop and re-create all indices. This is fine for rare changes, but tedious if you do it more often. (I could also update the index configuration, but I have many indices and I don't want to update all of them)

I thought about using word_list_path instead of the word_list, so that new partial words could be written into that file. Indices would get the new words without modification.

Now my question: Does Elasticsearch 7.17 pick up changes to the word list files automatically, or do I have to restart ES to get changes to them applied?


Keyword: hot reloading

It does not automatically pick up changes to the file. But you do not have to completely restart ES. You should be able to close and then reopen the index and that will refresh the file.

curl -XPOST "http://localhost:9200/test_index/_close"
curl -XPOST "http://localhost:9200/test_index/_open"

That's probably the best way I can think of right off to do so if you don't want to update the word lists across all of the indices you have. But it's worth noting that opening and closing leaves the index in an unworkable state for a bit while it opens back up and causes some resource churn; so there's some downsides to this approach. Should be faster than a restart though.

You might consider scripting out the word_list updates for all of your indices as an alternative.

1 Like

Thank you!