How to use user dictionary on elasticsearch / elasticsearch-analysis-kuromoji

Mai_2 · March 19, 2013, 4:01pm

Hi,

I've just started to use Elastic Search with elasticsearch /
elasticsearch-analysis-kuromoji, which is Japanese tokenizer. I works well
and now I would like to know how use user dictionary. From it's source
code, it seems to support user dictionary.

Thank you in advance for your support.

Regards,
Mai Nakagawa

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

johtani · March 20, 2013, 7:16am

Hi Mai

I try user dictionary, elasticsearch-0.90.0 Beta1 and elasticsearch-analysis-kuromoji/1.2.0.

Register the analyzer "my_analyzer" with kuromoji_tokenizer using user_dictionary.
"userdict_ja.txt" file put "ES_HOME/config/userdict_ja.txt" .(I use SOLR_HOME/example/solr/colleciton1/config/lang/userdict_ja.txt)

$ curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
"index":{
"analysis":{
"tokenizer" : {
"kuromoji_user_dict" : {
"type":"kuromoji_tokenizer",
"user_dictionary":"userdict_ja.txt"
}
},
"analyzer" : {
"my_analyzer" : {
"type" : "custom",
"tokenizer" : "kuromoji_user_dict"
}
}

}
}

}
'

Analyze "朝青龍" using "my_analyzer" with user_dictionary. "朝青龍" include userdict_ja.txt
$ curl -XGET 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '朝青龍'
{
"tokens" : [ {
"token" : "朝青龍",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
} ]

Analyze "朝青龍" using default "kuromoji" analyzer without user_dictionary
$ curl -XGET 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=kuromoji&pretty' -d '朝青龍'
{
"tokens" : [ {
"token" : "朝",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 1
}, {
"token" : "青龍",
"start_offset" : 1,
"end_offset" : 3,
"type" : "word",
"position" : 2
} ]

Jun Ohtani
johtani@gmail.com
blog : http://johtani.jugem.jp
twitter : http://twitter.com/johtani

On 2013/03/20, at 1:01, Mai nakagawa.mai@gmail.com wrote:

Hi,

I've just started to use Elastic Search with elasticsearch / elasticsearch-analysis-kuromoji, which is Japanese tokenizer. I works well and now I would like to know how use user dictionary. From it's source code, it seems to support user dictionary.

Thank you in advance for your support.

Regards,
Mai Nakagawa

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mai_2 · March 20, 2013, 4:19pm

Hi Jun san,

Thank you for your reply.

I've tried with the current latest stable version, which is
elasticsearch-0.20.5 and elasticsearch-analysis-kuromoji/1.1.0, and it
works as well !

Mai

On Wednesday, March 20, 2013 12:16:48 AM UTC-7, johtani wrote:

Hi Mai

I try user dictionary, elasticsearch-0.90.0 Beta1 and
elasticsearch-analysis-kuromoji/1.2.0.

Register the analyzer "my_analyzer" with kuromoji_tokenizer using
user_dictionary.
"userdict_ja.txt" file put "ES_HOME/config/userdict_ja.txt" .(I use
SOLR_HOME/example/solr/colleciton1/config/lang/userdict_ja.txt)

$ curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
"index":{
"analysis":{
"tokenizer" : {
"kuromoji_user_dict" : {
"type":"kuromoji_tokenizer",
"user_dictionary":"userdict_ja.txt"
}
},
"analyzer" : {
"my_analyzer" : {
"type" : "custom",
"tokenizer" : "kuromoji_user_dict"
}
}
    } 
} 
}
'

Analyze "朝青龍" using "my_analyzer" with user_dictionary. "朝青龍" include
userdict_ja.txt
$ curl -XGET '
http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty'
-d '朝青龍'
{
"tokens" : [ {
"token" : "朝青龍",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
} ]

Analyze "朝青龍" using default "kuromoji" analyzer without user_dictionary
$ curl -XGET '
http://localhost:9200/kuromoji_sample/_analyze?analyzer=kuromoji&pretty'
-d '朝青龍'
{
"tokens" : [ {
"token" : "朝",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 1
}, {
"token" : "青龍",
"start_offset" : 1,
"end_offset" : 3,
"type" : "word",
"position" : 2
} ]

Jun Ohtani
joh...@gmail.com <javascript:>
blog : http://johtani.jugem.jp
twitter : http://twitter.com/johtani

On 2013/03/20, at 1:01, Mai <nakaga...@gmail.com <javascript:>> wrote:

Hi,

I've just started to use Elastic Search with elasticsearch /
elasticsearch-analysis-kuromoji, which is Japanese tokenizer. I works well
and now I would like to know how use user dictionary. From it's source
code, it seems to support user dictionary.

Thank you in advance for your support.

Regards,
Mai Nakagawa

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
[analysis] Kuromoji: can't analaze text with Half-width space in user dictionary Elasticsearch	1	252	June 29, 2022
Need Help with Japanese analyzer - (Kuromoji) Elasticsearch	1	363	July 6, 2017
Kuromojiユーザ辞書に定義済みの単語で構成された複合語の形態素解析について日本語による質問・議論はこちら	3	3838	November 1, 2021
Elasticsearch mapping Elasticsearch	1	277	July 27, 2021
(Plugin Kuromoji) Can you help me resolve config elasticsearch.yml create analyzer? 日本語による質問・議論はこちら	5	1664	July 6, 2017

How to use user dictionary on elasticsearch / elasticsearch-analysis-kuromoji

Related topics