Kuromoji tokenizers and uax_url_email


(Hoang Vu Kim) #1

Hi all
I'm using elasticsearch for analysis keywords and text of twitter data in japanese.
I'm using kuromoji tokenizers too. I'ts doing very well but in this case having url on text like :

POST twitter_elastic_example/_analyze

{
"analyzer": "my_analyzer",
"text":"今日の仕事は終わられない http://yahoo.co.jp"
}

Result : [今日,仕事,終わる,http,yahoo,co,jp]
What i hope : [今日,仕事,終わる,http://yahoo.co.jp]

I'm searching somewhere and try using "tokenizer": "uax_url_email" , url is ok but japanese text is not.
But we can't merger uax_url_email and kuromoji_tokeniziers together . How to solve this ?
Thanks for your help . Sorry for my bad english !!


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.