Search with whitespace again


(Alexander Sviridov) #1

Hello!
I've run into same problem as described in this post:
http://elasticsearch-users.115913.n3.nabble.com/Search-with-white-space-td2574112.html

Query just doesn't return anything if there are more than one search
term — query "Centre" returns document, containing "Centre Est",
"Centre E" returns nothing.

Tried to use the "shingle" token filter and the "nGram" tokenizer
(separately) — no luck :frowning:

here is the config:

index :
analysis :
analyzer :
myAnalyzer2 :
type : custom
tokenizer : standard
filter : [shingle]

mapping:
"street" : {
"type" : "multi_field",
"fields" : {
"street_untouched" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"street_shingles" : {
"include_in_all" : false,
"analyzer" : "myAnalyzer2",
"type" : "string"
},
"street" : {
"type" : "string"
}
}
},

And this is query:

{"size"=>20,
"query"=>
{"bool"=>
{"should"=>
[{"prefix"=>{"street"=>"some str"}},
{"prefix"=>{"street_shingles"=>"some str"}},
{"prefix"=>{"street_untouched"=>"some str"}}]}}}

I would appreciate any corrections, advices or debug tips (is there a
way to peek inside the index?).
Thank you!


(Shay Banon) #2

The prefix query / filter do not perform analysis on the searched text, so
its basically expects a "full" match on it. For example, "hello world" when
indexed, is broken down into "hello" and "world" (with the standard
analyzer). Doing a prefix query on "hell" will work well (because hello
starts with hell), but doing a prefix query for "hello wo" will not work,
because there is no term that starts with "hello wo" (we have "hello" and
"world").

What you can do is use the text query, with the text_phrase_prefix option (
http://www.elasticsearch.org/guide/reference/query-dsl/text-query.html),
which builds a complex query that tries to act as a proper analyzed
prefix query. Note though, when using ngram, you usually don't really need
a prefix query..., ngram already "does" it for you at indexing time.

On Fri, Mar 16, 2012 at 6:43 PM, Alexander Sviridov <
alexander.sviridov@gmail.com> wrote:

Hello!
I've run into same problem as described in this post:

http://elasticsearch-users.115913.n3.nabble.com/Search-with-white-space-td2574112.html

Query just doesn't return anything if there are more than one search
term — query "Centre" returns document, containing "Centre Est",
"Centre E" returns nothing.

Tried to use the "shingle" token filter and the "nGram" tokenizer
(separately) — no luck :frowning:

here is the config:

index :
analysis :
analyzer :
myAnalyzer2 :
type : custom
tokenizer : standard
filter : [shingle]

mapping:
"street" : {
"type" : "multi_field",
"fields" : {
"street_untouched" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"street_shingles" : {
"include_in_all" : false,
"analyzer" : "myAnalyzer2",
"type" : "string"
},
"street" : {
"type" : "string"
}
}
},

And this is query:

{"size"=>20,
"query"=>
{"bool"=>
{"should"=>
[{"prefix"=>{"street"=>"some str"}},
{"prefix"=>{"street_shingles"=>"some str"}},
{"prefix"=>{"street_untouched"=>"some str"}}]}}}

I would appreciate any corrections, advices or debug tips (is there a
way to peek inside the index?).
Thank you!


(Alexander Sviridov) #3

Oh, it's exaclty what I was looking for, and there is no need for
multi fields and custom analyzers, as far as I see.

Thank you very much!

On Mar 17, 2:36 pm, Shay Banon kim...@gmail.com wrote:

The prefix query / filter do not perform analysis on the searched text, so
its basically expects a "full" match on it. For example, "hello world" when
indexed, is broken down into "hello" and "world" (with the standard
analyzer). Doing a prefix query on "hell" will work well (because hello
starts with hell), but doing a prefix query for "hello wo" will not work,
because there is no term that starts with "hello wo" (we have "hello" and
"world").

What you can do is use the text query, with the text_phrase_prefix option (http://www.elasticsearch.org/guide/reference/query-dsl/text-query.html),
which builds a complex query that tries to act as a proper analyzed
prefix query. Note though, when using ngram, you usually don't really need
a prefix query..., ngram already "does" it for you at indexing time.

On Fri, Mar 16, 2012 at 6:43 PM, Alexander Sviridov <

alexander.sviri...@gmail.com> wrote:

Hello!
I've run into same problem as described in this post:

http://elasticsearch-users.115913.n3.nabble.com/Search-with-white-spa...

Query just doesn't return anything if there are more than one search
term — query "Centre" returns document, containing "Centre Est",
"Centre E" returns nothing.

Tried to use the "shingle" token filter and the "nGram" tokenizer
(separately) — no luck :frowning:

here is the config:

index :
analysis :
analyzer :
myAnalyzer2 :
type : custom
tokenizer : standard
filter : [shingle]

mapping:
"street" : {
"type" : "multi_field",
"fields" : {
"street_untouched" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"street_shingles" : {
"include_in_all" : false,
"analyzer" : "myAnalyzer2",
"type" : "string"
},
"street" : {
"type" : "string"
}
}
},

And this is query:

{"size"=>20,
"query"=>
{"bool"=>
{"should"=>
[{"prefix"=>{"street"=>"some str"}},
{"prefix"=>{"street_shingles"=>"some str"}},
{"prefix"=>{"street_untouched"=>"some str"}}]}}}

I would appreciate any corrections, advices or debug tips (is there a
way to peek inside the index?).
Thank you!


(system) #4