Keyword tokenizer

paul1 · January 22, 2014, 11:17am

My mapping looks as below

"autocomplete_index":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase",
"syns_filter",
"my_edgeNgram"
]
}

Now when i analyze the configuration using analyze api the word after space
gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?analyzer=autocomplete_index&text=yale%20university&pretty

output

{ "tokens" : [ { "token" : "ya", "start_offset" : 0, "end_offset" : 15,"type" : "word","position" : 1}, {"token" : "yal","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 2}, {"token" : "yale","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 3}, {"token" : "yu","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 4} ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6bd7caa-b160-42ac-948c-6aab6884a51d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Binh_Ly · January 22, 2014, 5:30pm

Paul, Is it possible that your "syns_filter" is affecting your ngram
filter? What happens when you remove the syns_filter?

On Wednesday, January 22, 2014 6:17:12 AM UTC-5, paul wrote:

My mapping looks as below

"autocomplete_index":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase",
"syns_filter",
"my_edgeNgram"
]
}

Now when i analyze the configuration using analyze api the word after
space gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?analyzer=autocomplete_index&text=yale%20university&pretty

output

{ "tokens" : [ { "token" : "ya", "start_offset" : 0, "end_offset" : 15,"type" : "word","position" : 1}, {"token" : "yal","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 2}, {"token" : "yale","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 3}, {"token" : "yu","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 4} ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

paul1 · January 24, 2014, 5:03am

Binh , When i removed the syns_filter its still the same but when i changed
the "tokenizer":"keyword", to "whitespcae" it taking "university"
into account. May be its a tokenizer problem , when there is a space the
keyword tokenizer is omitting the word after space.

-paul

On Wed, Jan 22, 2014 at 11:00 PM, Binh Ly binh@hibalo.com wrote:

Paul, Is it possible that your "syns_filter" is affecting your ngram
filter? What happens when you remove the syns_filter?

On Wednesday, January 22, 2014 6:17:12 AM UTC-5, paul wrote:

My mapping looks as below

"autocomplete_index":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase",
"syns_filter",
"my_edgeNgram"
]
}

Now when i analyze the configuration using analyze api the word after
space gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?
analyzer=autocomplete_index&text=yale%20university&pretty

output

{ "tokens" : [ { "token" : "ya", "start_offset" : 0, "end_offset" : 15,"type" : "word","position" : 1}, {"token" : "yal","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 2}, {"token" : "yale","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 3}, {"token" : "yu","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 4} ]
}

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/inRyvJJDPpo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO066G0Y%2BAoVt%2BN6q1bxr8KFN2A686U2Cp%3DyyEoHT_s41_vbzg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Binh_Ly · January 24, 2014, 4:57pm

Paul, yes you are correct, I missed that. The keyword tokenizer will take
your entire string and make it into a single token - that's why it is not
ngramming "university".

On Friday, January 24, 2014 12:03:34 AM UTC-5, paul wrote:

Binh , When i removed the syns_filter its still the same but when i
changed the "tokenizer":"keyword", to "whitespcae" it taking
"university" into account. May be its a tokenizer problem , when there is a
space the keyword tokenizer is omitting the word after space.

-paul

On Wed, Jan 22, 2014 at 11:00 PM, Binh Ly <bi...@hibalo.com <javascript:>>wrote:

Paul, Is it possible that your "syns_filter" is affecting your ngram
filter? What happens when you remove the syns_filter?

On Wednesday, January 22, 2014 6:17:12 AM UTC-5, paul wrote:

My mapping looks as below

"autocomplete_index":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase",
"syns_filter",
"my_edgeNgram"
]
}

Now when i analyze the configuration using analyze api the word after
space gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?
analyzer=autocomplete_index&text=yale%20university&pretty

output

{ "tokens" : [ { "token" : "ya", "start_offset" : 0, "end_offset" : 15,"type" : "word", "position"
: 1 }, { "token" : "yal", "start_offset" : 0, "end_offset" : 15, "type"
: "word", "position" : 2 }, { "token" : "yale", "start_offset" : 0,"end_offset" : 15, "type"
: "word", "position" : 3 }, { "token" : "yu", "start_offset" : 0,"end_offset" : 15,"type" : "word", "position"
: 4 } ]}

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/inRyvJJDPpo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bc9516f-1830-4f70-a25b-276a9b43ddac%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Customized analyzers behave Elasticsearch	2	413	July 6, 2017
Problem with synonym token filter Elasticsearch	8	460	July 6, 2017
Elasticsearch can't hanlde space after add analyzer Elasticsearch	3	405	April 21, 2022
Keyword analyzer but allow redundant white spaces Elasticsearch	3	4092	January 15, 2018
Query_string breaks search term on space even when keyword tokenizer is used Elasticsearch	2	3403	November 15, 2018

Keyword tokenizer

output

output

output

output

Related topics