Search with whitespace again

Alexander_Sviridov · March 16, 2012, 4:43pm

Hello!
I've run into same problem as described in this post:
http://elasticsearch-users.115913.n3.nabble.com/Search-with-white-space-td2574112.html

Query just doesn't return anything if there are more than one search
term — query "Centre" returns document, containing "Centre Est",
"Centre E" returns nothing.

Tried to use the "shingle" token filter and the "nGram" tokenizer
(separately) — no luck

here is the config:

index :
analysis :
analyzer :
myAnalyzer2 :
type : custom
tokenizer : standard
filter : [shingle]

mapping:
"street" : {
"type" : "multi_field",
"fields" : {
"street_untouched" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"street_shingles" : {
"include_in_all" : false,
"analyzer" : "myAnalyzer2",
"type" : "string"
},
"street" : {
"type" : "string"
}
}
},

And this is query:

{"size"=>20,
"query"=>
{"bool"=>
{"should"=>
[{"prefix"=>{"street"=>"some str"}},
{"prefix"=>{"street_shingles"=>"some str"}},
{"prefix"=>{"street_untouched"=>"some str"}}]}}}

I would appreciate any corrections, advices or debug tips (is there a
way to peek inside the index?).
Thank you!

kimchy · March 17, 2012, 10:36am

The prefix query / filter do not perform analysis on the searched text, so
its basically expects a "full" match on it. For example, "hello world" when
indexed, is broken down into "hello" and "world" (with the standard
analyzer). Doing a prefix query on "hell" will work well (because hello
starts with hell), but doing a prefix query for "hello wo" will not work,
because there is no term that starts with "hello wo" (we have "hello" and
"world").

What you can do is use the text query, with the text_phrase_prefix option (
Elasticsearch Platform — Find real-time answers at scale | Elastic),
which builds a complex query that tries to act as a proper analyzed
prefix query. Note though, when using ngram, you usually don't really need
a prefix query..., ngram already "does" it for you at indexing time.

On Fri, Mar 16, 2012 at 6:43 PM, Alexander Sviridov <
alexander.sviridov@gmail.com> wrote:

Hello!
I've run into same problem as described in this post:

http://elasticsearch-users.115913.n3.nabble.com/Search-with-white-space-td2574112.html

Query just doesn't return anything if there are more than one search
term — query "Centre" returns document, containing "Centre Est",
"Centre E" returns nothing.

Tried to use the "shingle" token filter and the "nGram" tokenizer
(separately) — no luck

here is the config:

index :
analysis :
analyzer :
myAnalyzer2 :
type : custom
tokenizer : standard
filter : [shingle]

mapping:
"street" : {
"type" : "multi_field",
"fields" : {
"street_untouched" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"street_shingles" : {
"include_in_all" : false,
"analyzer" : "myAnalyzer2",
"type" : "string"
},
"street" : {
"type" : "string"
}
}
},

And this is query:

{"size"=>20,
"query"=>
{"bool"=>
{"should"=>
[{"prefix"=>{"street"=>"some str"}},
{"prefix"=>{"street_shingles"=>"some str"}},
{"prefix"=>{"street_untouched"=>"some str"}}]}}}

I would appreciate any corrections, advices or debug tips (is there a
way to peek inside the index?).
Thank you!

Alexander_Sviridov · March 17, 2012, 3:43pm

Oh, it's exaclty what I was looking for, and there is no need for
multi fields and custom analyzers, as far as I see.

Thank you very much!

On Mar 17, 2:36 pm, Shay Banon kim...@gmail.com wrote:

The prefix query / filter do not perform analysis on the searched text, so
its basically expects a "full" match on it. For example, "hello world" when
indexed, is broken down into "hello" and "world" (with the standard
analyzer). Doing a prefix query on "hell" will work well (because hello
starts with hell), but doing a prefix query for "hello wo" will not work,
because there is no term that starts with "hello wo" (we have "hello" and
"world").

What you can do is use the text query, with the text_phrase_prefix option (Elasticsearch Platform — Find real-time answers at scale | Elastic),
which builds a complex query that tries to act as a proper analyzed
prefix query. Note though, when using ngram, you usually don't really need
a prefix query..., ngram already "does" it for you at indexing time.

On Fri, Mar 16, 2012 at 6:43 PM, Alexander Sviridov <

alexander.sviri...@gmail.com> wrote:

Hello!
I've run into same problem as described in this post:

http://elasticsearch-users.115913.n3.nabble.com/Search-with-white-spa...

Query just doesn't return anything if there are more than one search
term — query "Centre" returns document, containing "Centre Est",
"Centre E" returns nothing.

Tried to use the "shingle" token filter and the "nGram" tokenizer
(separately) — no luck

here is the config:

index :
analysis :
analyzer :
myAnalyzer2 :
type : custom
tokenizer : standard
filter : [shingle]

mapping:
"street" : {
"type" : "multi_field",
"fields" : {
"street_untouched" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"street_shingles" : {
"include_in_all" : false,
"analyzer" : "myAnalyzer2",
"type" : "string"
},
"street" : {
"type" : "string"
}
}
},

And this is query:

{"size"=>20,
"query"=>
{"bool"=>
{"should"=>
[{"prefix"=>{"street"=>"some str"}},
{"prefix"=>{"street_shingles"=>"some str"}},
{"prefix"=>{"street_untouched"=>"some str"}}]}}}

I would appreciate any corrections, advices or debug tips (is there a
way to peek inside the index?).
Thank you!

Topic		Replies	Views
Query doesn't find results Elasticsearch	2	404	July 6, 2017
Setting the whitespace analyzer for query_string search Elasticsearch elastic-stack-alerting	9	2062	August 11, 2020
Elasticsearch can't hanlde space after add analyzer Elasticsearch	3	405	April 21, 2022
Keyword analyzer but allow redundant white spaces Elasticsearch	3	4081	January 15, 2018
Aalyzer issue - terms not getting tokenized on whitespace Elasticsearch	1	301	July 6, 2017

Search with whitespace again

Related topics