Keyword tokenizer


(paul) #1

My mapping looks as below

"autocomplete_index":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase",
"syns_filter",
"my_edgeNgram"
]
}

Now when i analyze the configuration using analyze api the word after space
gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?analyzer=autocomplete_index&text=yale%20university&pretty

output

{ "tokens" : [ { "token" : "ya", "start_offset" : 0, "end_offset" : 15,"type" : "word","position" : 1}, {"token" : "yal","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 2}, {"token" : "yale","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 3}, {"token" : "yu","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 4} ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6bd7caa-b160-42ac-948c-6aab6884a51d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #2

Paul, Is it possible that your "syns_filter" is affecting your ngram
filter? What happens when you remove the syns_filter?

On Wednesday, January 22, 2014 6:17:12 AM UTC-5, paul wrote:

My mapping looks as below

"autocomplete_index":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase",
"syns_filter",
"my_edgeNgram"
]
}

Now when i analyze the configuration using analyze api the word after
space gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?analyzer=autocomplete_index&text=yale%20university&pretty

output

{ "tokens" : [ { "token" : "ya", "start_offset" : 0, "end_offset" : 15,"type" : "word","position" : 1}, {"token" : "yal","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 2}, {"token" : "yale","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 3}, {"token" : "yu","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 4} ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(paul) #3

Binh , When i removed the syns_filter its still the same but when i changed
the "tokenizer":"keyword", to "whitespcae" it taking "university"
into account. May be its a tokenizer problem , when there is a space the
keyword tokenizer is omitting the word after space.

-paul

On Wed, Jan 22, 2014 at 11:00 PM, Binh Ly binh@hibalo.com wrote:

Paul, Is it possible that your "syns_filter" is affecting your ngram
filter? What happens when you remove the syns_filter?

On Wednesday, January 22, 2014 6:17:12 AM UTC-5, paul wrote:

My mapping looks as below

"autocomplete_index":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase",
"syns_filter",
"my_edgeNgram"
]
}

Now when i analyze the configuration using analyze api the word after
space gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?
analyzer=autocomplete_index&text=yale%20university&pretty

output

{ "tokens" : [ { "token" : "ya", "start_offset" : 0, "end_offset" : 15,"type" : "word","position" : 1}, {"token" : "yal","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 2}, {"token" : "yale","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 3}, {"token" : "yu","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 4} ]
}

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/inRyvJJDPpo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO066G0Y%2BAoVt%2BN6q1bxr8KFN2A686U2Cp%3DyyEoHT_s41_vbzg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #4

Paul, yes you are correct, I missed that. The keyword tokenizer will take
your entire string and make it into a single token - that's why it is not
ngramming "university".

On Friday, January 24, 2014 12:03:34 AM UTC-5, paul wrote:

Binh , When i removed the syns_filter its still the same but when i
changed the "tokenizer":"keyword", to "whitespcae" it taking
"university" into account. May be its a tokenizer problem , when there is a
space the keyword tokenizer is omitting the word after space.

-paul

On Wed, Jan 22, 2014 at 11:00 PM, Binh Ly <bi...@hibalo.com <javascript:>>wrote:

Paul, Is it possible that your "syns_filter" is affecting your ngram
filter? What happens when you remove the syns_filter?

On Wednesday, January 22, 2014 6:17:12 AM UTC-5, paul wrote:

My mapping looks as below

"autocomplete_index":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase",
"syns_filter",
"my_edgeNgram"
]
}

Now when i analyze the configuration using analyze api the word after
space gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?
analyzer=autocomplete_index&text=yale%20university&pretty

output

{ "tokens" : [ { "token" : "ya", "start_offset" : 0, "end_offset" : 15,"type" : "word", "position"
: 1 }, { "token" : "yal", "start_offset" : 0, "end_offset" : 15, "type"
: "word", "position" : 2 }, { "token" : "yale", "start_offset" : 0,"end_offset" : 15, "type"
: "word", "position" : 3 }, { "token" : "yu", "start_offset" : 0,"end_offset" : 15,"type" : "word", "position"
: 4 } ]}

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/inRyvJJDPpo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bc9516f-1830-4f70-a25b-276a9b43ddac%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5