Illegal_argument_exception while creating new index

Hi. I'm running into issues when trying to create a new index in ElasticSearch 7.5

The error message is:

The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [18]. This limit can be set by changing the [index.max_ngram_diff] index level setting.

I'm learning about custom analyzers so it could be something I'm unknowingly doing wrong. I've read about this issue on two other topics here and here If it's as simple as changing index.max_ngram_diff that's fine. The linked questions above point to an issue where that didn't seem to work as the user expected.

I'm most curious about where to get more info & learn rather than a spoon-fed answer. Can anyone point me in the right direction?

Here is the analysis section of my PUT request.

  "analysis": {
            "analyzer": {
                "case_insensitive_sort": {
                    "filter": [
                        "lowercase"
                    ],
                    "tokenizer": "keyword"
                },
                "email": {
                    "filter": [
                        "email",
                        "lowercase",
                        "unique"
                    ],
                    "tokenizer": "uax_url_email"
                },
                "ngram_filter_analyzer": {
                    "type": "custom",
                    "filter": [
                        "lowercase",
                        "ngram_filter"
                    ],
                    "tokenizer": "standard"
                }
            },
            "filter": {
                "email": {
                    "preserve_original": "1",
                    "type": "pattern_capture",
                    "patterns": [
                        "([^@]+)",
                        "(\\p{L}+)",
                        "(\\d+)",
                        "@(.+)",
                        "([^-@]+)"
                    ]
                },
                "ngram_filter": {
                    "min_gram": "2",
                    "type": "nGram",
                    "max_gram": "20"
                }
            }
        }

The error message states that the difference between min_gram and max_gram is 18, which is 20-2, which in turn are your current settings.

There is also a mention of a setting how you can change that behaviour, but that should be explicitely set, as this means you will index a lot of terms and you know what you do, see https://www.elastic.co/guide/en/elasticsearch/reference/7.5/index-modules.html#index-max-ngram-diff

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.