How to use user dictionary on elasticsearch / elasticsearch-analysis-kuromoji

Hi,

I've just started to use Elastic Search with elasticsearch /
elasticsearch-analysis-kuromoji, which is Japanese tokenizer. I works well
and now I would like to know how use user dictionary. From it's source
code, it seems to support user dictionary.

Thank you in advance for your support.

Regards,
Mai Nakagawa

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Mai

I try user dictionary, elasticsearch-0.90.0 Beta1 and elasticsearch-analysis-kuromoji/1.2.0.

Register the analyzer "my_analyzer" with kuromoji_tokenizer using user_dictionary.
"userdict_ja.txt" file put "ES_HOME/config/userdict_ja.txt" .(I use SOLR_HOME/example/solr/colleciton1/config/lang/userdict_ja.txt)

$ curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
"index":{
"analysis":{
"tokenizer" : {
"kuromoji_user_dict" : {
"type":"kuromoji_tokenizer",
"user_dictionary":"userdict_ja.txt"
}
},
"analyzer" : {
"my_analyzer" : {
"type" : "custom",
"tokenizer" : "kuromoji_user_dict"
}
}

    }
}

}
'

Analyze "朝青龍" using "my_analyzer" with user_dictionary. "朝青龍" include userdict_ja.txt
$ curl -XGET 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '朝青龍'
{
"tokens" : [ {
"token" : "朝青龍",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
} ]

Analyze "朝青龍" using default "kuromoji" analyzer without user_dictionary
$ curl -XGET 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=kuromoji&pretty' -d '朝青龍'
{
"tokens" : [ {
"token" : "朝",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 1
}, {
"token" : "青龍",
"start_offset" : 1,
"end_offset" : 3,
"type" : "word",
"position" : 2
} ]


Jun Ohtani
johtani@gmail.com
blog : http://johtani.jugem.jp
twitter : http://twitter.com/johtani

On 2013/03/20, at 1:01, Mai nakagawa.mai@gmail.com wrote:

Hi,

I've just started to use Elastic Search with elasticsearch / elasticsearch-analysis-kuromoji, which is Japanese tokenizer. I works well and now I would like to know how use user dictionary. From it's source code, it seems to support user dictionary.

Thank you in advance for your support.

Regards,
Mai Nakagawa

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jun san,

Thank you for your reply.

I've tried with the current latest stable version, which is
elasticsearch-0.20.5 and elasticsearch-analysis-kuromoji/1.1.0, and it
works as well !

Mai

On Wednesday, March 20, 2013 12:16:48 AM UTC-7, johtani wrote:

Hi Mai

I try user dictionary, elasticsearch-0.90.0 Beta1 and
elasticsearch-analysis-kuromoji/1.2.0.

Register the analyzer "my_analyzer" with kuromoji_tokenizer using
user_dictionary.
"userdict_ja.txt" file put "ES_HOME/config/userdict_ja.txt" .(I use
SOLR_HOME/example/solr/colleciton1/config/lang/userdict_ja.txt)

$ curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
"index":{
"analysis":{
"tokenizer" : {
"kuromoji_user_dict" : {
"type":"kuromoji_tokenizer",
"user_dictionary":"userdict_ja.txt"
}
},
"analyzer" : {
"my_analyzer" : {
"type" : "custom",
"tokenizer" : "kuromoji_user_dict"
}
}

    } 
} 

}
'

Analyze "朝青龍" using "my_analyzer" with user_dictionary. "朝青龍" include
userdict_ja.txt
$ curl -XGET '
http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty'
-d '朝青龍'
{
"tokens" : [ {
"token" : "朝青龍",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
} ]

Analyze "朝青龍" using default "kuromoji" analyzer without user_dictionary
$ curl -XGET '
http://localhost:9200/kuromoji_sample/_analyze?analyzer=kuromoji&pretty'
-d '朝青龍'
{
"tokens" : [ {
"token" : "朝",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 1
}, {
"token" : "青龍",
"start_offset" : 1,
"end_offset" : 3,
"type" : "word",
"position" : 2
} ]


Jun Ohtani
joh...@gmail.com <javascript:>
blog : http://johtani.jugem.jp
twitter : http://twitter.com/johtani

On 2013/03/20, at 1:01, Mai <nakaga...@gmail.com <javascript:>> wrote:

Hi,

I've just started to use Elastic Search with elasticsearch /
elasticsearch-analysis-kuromoji, which is Japanese tokenizer. I works well
and now I would like to know how use user dictionary. From it's source
code, it seems to support user dictionary.

Thank you in advance for your support.

Regards,
Mai Nakagawa

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.