Problem with uax_url_email tokenizer


(Yuriy Vasilyev) #1

I have in my index cyrilic urls which looks like www.100тысячкниг.рф
and prefix search on this entries work bad (
If I try non-cyrilic url such as www.yandex.ru all works fine

How I can solve this problem?

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 198

{"tokens":[{"token":"www","start_offset":0,"end_offset":
3,"type":"","position":1},
{"token":"100тысячкниг.рф","start_offset":4,"end_offset":
19,"type":"","position":2}]}

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=UTF-8
Content-Length: 104

{"tokens":[{"token":"www.yandex.ru","start_offset":0,"end_offset":
13,"type":"","position":1}]}


(Yuriy Vasilyev) #2

UP

среда, 6 июня 2012 г., 11:37:28 UTC+7 пользователь Yuriy Vasilyev написал:

I have in my index cyrilic urls which looks like www.100тысячкниг.рф
and prefix search on this entries work bad (
If I try non-cyrilic url such as www.yandex.ru all works fine

How I can solve this problem?

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 198

{"tokens":[{"token":"www","start_offset":0,"end_offset":
3,"type":"","position":1},
{"token":"100тысячкниг.рф","start_offset":4,"end_offset":
19,"type":"","position":2}]}

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=UTF-8
Content-Length: 104

{"tokens":[{"token":"www.yandex.ru","start_offset":0,"end_offset":
13,"type":"","position":1}]}


(system) #3