Issue about Invalid stemmer class specified CJK

Dear all,

I know there is another topic that was created, but I could not find any usable solutions, and since I have been looking for the solution for hours, I resolute myself to post here.

Overall, I know that I use an outdated stack, but it is for a customer, and I cannot really change it, so please forgive me in advance.

Currently I use ElasticSearch 5.5, with OpenJDK 1.8 and a Magento 2.2.3
I am not very strong in ElasticSearch so I am quite confused about it.

The overall need is to be able to make Magento 2.2.3 work with ElasticSearch over a Chinese website.
Then I had several errors that I fixed :

  • Installed the plugins analysis-icu & analysis-smartcn, and they are loaded
    ** [2021-03-15T18:55:17,710][INFO ][o.e.p.PluginsService ] [MSpxtmr] loaded plugin [analysis-icu]
    ** [2021-03-15T18:55:17,710][INFO ][o.e.p.PluginsService ] [MSpxtmr] loaded plugin [analysis-smartcn]
  • I had problems with total fields limit to 1000, I updated it to 3000
  • Now when Magento tries to create an index in Chinese (apparently) it does not work
[2021-03-15T19:02:30,740][DEBUG][o.e.a.b.TransportShardBulkAction] [MSpxtmr] [magento2_product_2_v1][4] failed to execute bulk item (index) BulkShardRequest [[magento2_product_2_v1][4]] containing [116] requests and a refresh
java.lang.IllegalArgumentException: Invalid stemmer class specified: Cjk
        at org.apache.lucene.analysis.snowball.SnowballFilter.<init>(SnowballFilter.java:83) ~[lucene-analyzers-common-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:30:03]
        at org.elasticsearch.index.analysis.StemmerTokenFilterFactory.create(StemmerTokenFilterFactory.java:256) ~[elasticsearch-5.5.3.jar:5.5.3]
        at org.elasticsearch.index.analysis.CustomAnalyzer.createComponents(CustomAnalyzer.java:86) ~[elasticsearch-5.5.3.jar:5.5.3]
        at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:134) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:134) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:198) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.apache.lucene.document.Field.tokenStream(Field.java:574) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:740) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:447) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:403) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]

If I understand well, there is one missing class possibly called "Cjk", but I do not understand why since the plugins are loaded and it can be noticed, either in the logs or using the "GET _cat/plugins" command. (I installed them using the elasticsearch-plugin command)

I eventually tried to copy the jar files in the lib directory, but this only created a Jar Hell, and I had to remove them, meaning they were already found, recognized and loaded.

In case it can help, here is the configuration of the faulty index :

[root@bulgaricn-mag2-dev-ss1 bin]# curl -X GET "localhost:9200/magento2_product_2_v1/_settings?pretty"
{
  "magento2_product_2_v1" : {
    "settings" : {
      "index" : {
        "mapping" : {
          "total_fields" : {
            "limit" : "3000"
          }
        },
        "number_of_shards" : "5",
        "provided_name" : "magento2_product_2_v1",
        "creation_date" : "1615801921169",
        "analysis" : {
          "filter" : {
            "default_stemmer" : {
              "type" : "stemmer",
              "language" : "cjk"
            },
            "unique_stem" : {
              "type" : "unique",
              "only_on_same_position" : "true"
            }
          },
          "char_filter" : {
            "default_char_filter" : {
              "type" : "html_strip"
            }
          },
          "analyzer" : {
            "default" : {
              "filter" : [
                "lowercase",
                "keyword_repeat",
                "default_stemmer",
                "unique_stem"
              ],
              "char_filter" : [
                "default_char_filter"
              ],
              "type" : "custom",
              "tokenizer" : "default_tokenizer"
            }
          },
          "tokenizer" : {
            "default_tokenizer" : {
              "type" : "standard"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "rmNwcq49Rv-yvFV_PIjhDg",
        "version" : {
          "created" : "5050399"
        }
      }
    }
  }
}

I would really appreciate a hand in this matter, since I already spent more than 3 days on it.

Looking forward reading you.

Olivier

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.