Issues creating custom_analyzer

I'm currently using python along with elasticsearch.
My data only has 2 columns:

  • "sms_id"
  • "msg_txt"

I'm basically creating a custom n-gram analyzer and calling the ngram analyzer within my mapping for "msg_txt" field only.

I'm not sure why I'm facing errors. Could someone help?
Thank you!

Warmest Regards,
Min Han

Min Han,

It is hard for me to be sure from the screenshot, but it looks like you have defined the tokenizer in the settings block rather than the analysis block. That is, it looks like you have:

{
    "settings": {
        [...]
        "analysis": {
            "analyzer": {
                "my_ngram_analyzer": {
                    [...]
                }
            }
        },
        "tokenizer": {
            "ngram_tokenizer": {
                [...]
            },
        },
    },
    "mapping": {
        [...]
    }
}

...when you should have:

{
    "settings": {
        [...]
        "analysis": {
            "analyzer": {
                "my_ngram_analyzer": {
                    [...]
                }
            },
            "tokenizer": {
                "ngram_tokenizer": {
                    [...]
                },
            },
        }
    },
    "mapping": {
        [...]
    }
}

I hope this helps. One thing that might be useful to you on these forums is to paste code samples as text rather than as screenshots. It will make it easier for other community members to check on your code. Just make sure that you highlight your code and click the </> icon before you post, so that your code is well-formatted.

-William

1 Like

Thanks for taking the time to reply to me William!

I'm still receiving an error after making the amendments. By the way, I'm using python elasticsearch.

response = es.indices.create(
index='custom_analyzer',
body={
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 0,   
    "analysis": {
      "analyzer": {
        "my_ngram_analyzer": {
            "filter": ["lowercase"],
            "tokenizer": "ngram_tokenizer"
        }
      },
        "tokenizer":{
            "ngram_tokenizer":{
                "type": "n_gram",
                "min_gram": 3,
                "max_gram":  3,
                "token_chars":[
                    "letter",
                    "digit"
                ]
            }
        }
  }
},
"mapping":{
    "properties":{
        "msg_txt":{
            "type": "text",
            "analyzer": "ngram_tokenizer"
        }
        
    }
}

}
)

print(response)

Dear William,

Thanks for your help but I've resolved the error already. I should have a s in the "mapping", ngram and not n_gram, and my analyzer should have been my_ngram_analyzer of course.

Once again thanks for your gracious help! (:smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.