Elasticsearch stopped allocating shards due to missing kuromoji_part_of_speech filter

Hi,
We have a web index containing Japanese documents, which are using analyzers from the analysis-icu and analysis-kuromoji plugins. Our Elasticsearch cluster is deployed via ECK and pulls the plugins inside the init container.

This has worked fine so far. Recently however, the nodes suddenly stopped allocating the shards with the error:

shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-11-10T08:07:20.393Z], failed_attempts[5], failed_nodes[[DZkSjlD3SSyHPmrFEL1SzQ, pK3-4-S5TtStIwG1YArLag, 2_EeQbAlS0WoANaZT7H9oQ]], delayed=false, details[failed shard on node [pK3-4-S5TtStIwG1YArLag]: failed to create index, failure IllegalArgumentException[Unknown filter type [kuromoji_part_of_speech] for [ja_pos_filter]]], allocation_status[no_attempt]]]

ja_pos_filter looks like this:

"ja_pos_filter" : {
  "type" : "kuromoji_part_of_speech",
  "stoptags" : [
    "\\u52a9\\u8a5e-\\u683c\\u52a9\\u8a5e-\\u4e00\\u822c",
    "\\u52a9\\u8a5e-\\u7d42\\u52a9\\u8a5e"
  ]
}

I checked and the analysis-kuromoji plugin is correctly installed and shows up in _cat/plugins. The Elasticsearch version is 7.7.1. I tried upgrading one of the affected nodes to 7.9.3, but it is still having the same issue. We did not cache the downloaded plugins, so I wonder if a more recent version of the plugin changed something that would cause this incompatibility. There are no errors in the node's runtime log.

At this point, I am afraid of restarting more nodes, since that would eventually result in the whole index being unavailable. Any idea what the problem might be?

I checked the contents of the 7.9.3 kuromoji plugin and the AnalysisKuromojiPlugin class inside AnalysisKuromojiPlugin.class file does add KuromojiPartOfSpeechFilterFactory to the token filter map with the key kuromoji_part_of_speech. I am on a total loss here, why it isn't working.

The issue was actually quite trivial to solve in the end. The reporting nodes did indeed have the needed plugins, but a handful of other nodes failed to download them properly and started anyway. It was pretty much random when a shard would be allocated and when it would throw an error and the node reporting the error was never the node that was actually missing the plugin. But after ensuring that ALL nodes had the needed plugins without exception, the issue resolved itself immediately.

Adding set -e at the start of the init container install script should ensure this doesn't happen again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.