Hi I'm using Elasticsearch 5.0.1 with the Kuromoji plugin.
Until now I've merely used its default configuration by using the following mapping:
"analyzer": "kuromoji"
However as stated in the docs (Kuromoji analyzer) it consists of a character filter, tokenizer and various token filters. Some of these seem to be applied in the analyzer's default settings (e.g. kuromoji_baseform), some not (e.g. kuromoji_number token filter).
I would like to know which filters are being used in Kuromojis default setting. Is there any way to find out with an API-call?
I've tried looking into the Plugin-sources as this doesn't seem to be documented: KuromojiAnalyzerProvider.java
However, there doesn't seem to be any character-filters or tokenizers defined.
Oh great thanks, I didn't know the JapaneseAnalyzer is also part of Kuromoji!
So it seems from this file that following components are used by default:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.