Confused about query_string and the use of wildcards

Enrique_Medina_Monte · March 15, 2011, 7:52pm

Hi,

I'm struggling on why a simple query like this one:

"query": {
"query_string": {
"default_operator": "AND",
"query": "*phone"
}
}

does not return any results, whereas this one (notice the extra '*' as a
suffix to the query):

"query": {
"query_string": {
"default_operator": "AND",
"query": "phone"
}
}

does return some results. But cannot understand why, to be honest...

Thanks.

kimchy · March 15, 2011, 10:56pm

Maybe because you have terms that don't end with phone?
On Tuesday, March 15, 2011 at 9:52 PM, Enrique Medina Montenegro wrote:

Hi,

I'm struggling on why a simple query like this one:

"query": {
"query_string": {
"default_operator": "AND",
"query": "*phone"
}
}

does not return any results, whereas this one (notice the extra '*' as a suffix to the query):

"query": {
"query_string": {
"default_operator": "AND",
"query": "phone"
}
}

does return some results. But cannot understand why, to be honest...

Thanks.

Enrique_Medina_Monte · March 16, 2011, 9:08am

Shay,

But what about iPhone? Shouldn't it be included as part of the results for
"*phone"?

Or maybe it's just that the '*' doesn't work here as a real wildcard, as in
SQL a '%'?

Thanks.

On Tue, Mar 15, 2011 at 11:56 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

Maybe because you have terms that don't end with phone?

On Tuesday, March 15, 2011 at 9:52 PM, Enrique Medina Montenegro wrote:

Hi,

I'm struggling on why a simple query like this one:

"query": {
"query_string": {
"default_operator": "AND",
"query": "*phone"
}
}

does not return any results, whereas this one (notice the extra '*' as a
suffix to the query):

"query": {
"query_string": {
"default_operator": "AND",
"query": "phone"
}
}

does return some results. But cannot understand why, to be honest...

Thanks.

Clinton_Gormley · March 16, 2011, 10:02am

Hi Enrique

But what about iPhone? Shouldn't it be included as part of the results
for "*phone"?

Or maybe it's just that the '*' doesn't work here as a real wildcard,
as in SQL a '%'?

It is the same as % in SQL, and your example works for me.

I suggest you gist a complete curl recreation, from index creation, data
indexing, and searching to demonstrate the problem.

clint

Enrique_Medina_Monte · March 16, 2011, 10:36am

Clinton,

I found the issue and it was on my default Spanish analyzer. For some
reason, iPhone gets analyzed in Spanish like this:

http://localhost:9200/mytest/_analyze?text=iPhone+4

{"tokens":[{"token":"iphon","start_offset":0,"end_offset":6,"type":"","position":1},{"token":"4","start_offset":7,"end_offset":8,"type":"","position":2}]}

whereas in the case of the default analyzer it gets like this:

http://localhost:9200/mytest/_analyze?text=iPhone+4&analyzer=standard

{"tokens":[{"token":"iphone","start_offset":0,"end_offset":6,"type":"","position":1},{"token":"4","start_offset":7,"end_offset":8,"type":"","position":2}]}

Hence, the token "iphon" in Spanish was not matching the "phone", but
matches "phon".

What do you recommend in these particular cases? Adding iPhone as a stop
word?

Thanks.

On Wed, Mar 16, 2011 at 11:02 AM, Clinton Gormley
clinton@iannounce.co.ukwrote:

Hi Enrique

But what about iPhone? Shouldn't it be included as part of the results
for "*phone"?

Or maybe it's just that the '*' doesn't work here as a real wildcard,
as in SQL a '%'?

It is the same as % in SQL, and your example works for me.

I suggest you gist a complete curl recreation, from index creation, data
indexing, and searching to demonstrate the problem.

clint

Clinton_Gormley · March 16, 2011, 10:51am

Hi Enrique

I found the issue and it was on my default Spanish analyzer. For some
reason, iPhone gets analyzed in Spanish like this:

http://localhost:9200/mytest/_analyze?text=iPhone+4
{"tokens":[{"token":"iphon","start_offset":0,"end_offset":6,"type":"","position":1},{"token":"4","start_offset":7,"end_offset":8,"type":"","position":2}]}

Presumably you're using the snowball stemmer? It analyzes 'iphone' as
'iphon' to be able to recognise eg "cansada" and "cansado" as the same
stem.

All you need to do is to be sure that you're using the same analyzer at
index time as at search time.

You have a few options here:

you're searching on a field (eg product_name) and you set the
'analyzer' for that field to be the spanish stemmer, when
you put the mapping

Elasticsearch Platform — Find real-time answers at scale | Elastic
you're searching on the '_all' field (which is the default)
and you can set the analyzer for the '_all' field to
be the spanish stemmer when you put the mapping

Elasticsearch Platform — Find real-time answers at scale | Elastic

(this is probably not what you want, as the _all field will contain
some fields which shouldn't have the stemmer applied)
you can't determine at mapping time which language you're
going to be using at search time, and you specify the
analyzer in the query_string query itself:

Elasticsearch Platform — Find real-time answers at scale | Elastic
you could do something wizzy per document with the _analyzer field

Elasticsearch Platform — Find real-time answers at scale | Elastic

clint

Enrique_Medina_Monte · March 16, 2011, 11:03am

Clinton,

Based on some other discussion with Shay, I defined this in the
elasticsearch.yml config file:

index:
analysis:
analyzer:
default:
type: es.cuestamenos.lucene.analizadores.SpanishAnalyzerProvider

And the analyzer is this one:

gist.github.com

https://gist.github.com/emedina/872322

SpanishAnalyzer.java

import java.io.Reader;
import java.util.Arrays;
import java.util.HashSet;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.ASCIIFoldingFilter;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.snowball.SnowballFilter;

This file has been truncated. show original

Shouldn't that be enough both for index and search?

Thanks.

On Wed, Mar 16, 2011 at 11:51 AM, Clinton Gormley
clinton@iannounce.co.ukwrote:

Hi Enrique

I found the issue and it was on my default Spanish analyzer. For some
reason, iPhone gets analyzed in Spanish like this:

http://localhost:9200/mytest/_analyze?text=iPhone+4

{"tokens":[{"token":"iphon","start_offset":0,"end_offset":6,"type":"","position":1},{"token":"4","start_offset":7,"end_offset":8,"type":"","position":2}]}

Presumably you're using the snowball stemmer? It analyzes 'iphone' as
'iphon' to be able to recognise eg "cansada" and "cansado" as the same
stem.

All you need to do is to be sure that you're using the same analyzer at
index time as at search time.

You have a few options here:

you're searching on a field (eg product_name) and you set the
'analyzer' for that field to be the spanish stemmer, when
you put the mapping

Elasticsearch Platform — Find real-time answers at scale | Elastic

you're searching on the '_all' field (which is the default)
and you can set the analyzer for the '_all' field to
be the spanish stemmer when you put the mapping

Elasticsearch Platform — Find real-time answers at scale | Elastic

(this is probably not what you want, as the _all field will contain
some fields which shouldn't have the stemmer applied)

you can't determine at mapping time which language you're
going to be using at search time, and you specify the
analyzer in the query_string query itself:

Elasticsearch Platform — Find real-time answers at scale | Elastic

you could do something wizzy per document with the _analyzer field

Elasticsearch Platform — Find real-time answers at scale | Elastic

clint

Clinton_Gormley · March 16, 2011, 11:10am

Hi Enrique

Based on some other discussion with Shay, I defined this in the
elasticsearch.yml config file:

index:
analysis:
analyzer:
default:
type:
es.cuestamenos.lucene.analizadores.SpanishAnalyzerProvider

Shouldn't that be enough both for index and search?

I would have thought so. But it doesn't appear to be applied at search
time. Are you searching against a specific field, or against _all?

If it works against a specific field, but not against _all, then perhaps
there is a bug.

A complete curl recreation would be useful

clint

Enrique_Medina_Monte · March 16, 2011, 11:23am

Clinton,

I'm searching against "_all", which is the default.

I get consistent results (even the lack of results) when adding a specific
field or specific analyzer:

{
"query": {
"query_string": {
"default_operator": "AND",
"query": "*phon",
"default_field": "name",
"analyzer": "default"
}
}
}

So I guess it's not a bug, but as explained in my previous email, the fact
that the Spanish analyzer created a token = "iphon" for iPhone so no matter
how I search, it will never match "*phone", right?

Regards.

On Wed, Mar 16, 2011 at 12:10 PM, Clinton Gormley
clinton@iannounce.co.ukwrote:

Hi Enrique

Based on some other discussion with Shay, I defined this in the
elasticsearch.yml config file:

index:
analysis:
analyzer:
default:
type:
es.cuestamenos.lucene.analizadores.SpanishAnalyzerProvider

Shouldn't that be enough both for index and search?

I would have thought so. But it doesn't appear to be applied at search
time. Are you searching against a specific field, or against _all?

If it works against a specific field, but not against _all, then perhaps
there is a bug.

A complete curl recreation would be useful

clint

Clinton_Gormley · March 16, 2011, 11:33am

hi Enrique

I've just remembered your original question, which was:

"*phone"

vs
"phone"

As I understand it, the way this wildcard search works is that Lucene
looks up all matching terms, and searches against each of these.

So for some reason, "*phone" doesn't find the the right term, but
"phone" does.

I get consistent results (even the lack of results) when adding a
specific field or specific analyzer:

You mean, you see the same thing?

So I guess it's not a bug, but as explained in my previous email, the
fact that the Spanish analyzer created a token = "iphon" for iPhone so
no matter how I search, it will never match "*phone", right?

No. This should work. For instance, using the default analyzer, if you
index "The Quick BROWN fox" you end up with the terms
"quick","brown","fox"

If you then search for "The Quick BROWN fox", it performs the same
analysis, resulting in the same terms, and searches for those.

So to me (and I'm ignorant of the Lucene internals) it sounds like a
potential bug in the lucene query parser syntax.

A complete recreation would be very useful for debugging.

clint

Joaquin_Cuenca_Abela · March 16, 2011, 11:52am

I don't know why your pattern search is not working, but on this
analyzer you're ascii folding the terms before you remove the stop
words, and your stop words contain non ascii letters. You should
either ascii fold your stop words or remove them before the ascii
folding step.

Cheers,

On Wed, Mar 16, 2011 at 12:03 PM, Enrique Medina Montenegro
e.medina.m@gmail.com wrote:

Clinton,
Based on some other discussion with Shay, I defined this in the
elasticsearch.yml config file:
index:
analysis:
analyzer:
default:
type: es.cuestamenos.lucene.analizadores.SpanishAnalyzerProvider
And the analyzer is this one:
Custom analyzer for Spanish · GitHub
Shouldn't that be enough both for index and search?
Thanks.

On Wed, Mar 16, 2011 at 11:51 AM, Clinton Gormley clinton@iannounce.co.uk
wrote:

Hi Enrique

I found the issue and it was on my default Spanish analyzer. For some
reason, iPhone gets analyzed in Spanish like this:

http://localhost:9200/mytest/_analyze?text=iPhone+4

{"tokens":[{"token":"iphon","start_offset":0,"end_offset":6,"type":"","position":1},{"token":"4","start_offset":7,"end_offset":8,"type":"","position":2}]}

Presumably you're using the snowball stemmer? It analyzes 'iphone' as
'iphon' to be able to recognise eg "cansada" and "cansado" as the same
stem.

All you need to do is to be sure that you're using the same analyzer at
index time as at search time.

You have a few options here:

you're searching on a field (eg product_name) and you set the
'analyzer' for that field to be the spanish stemmer, when
you put the mapping

Elasticsearch Platform — Find real-time answers at scale | Elastic

you're searching on the '_all' field (which is the default)
and you can set the analyzer for the '_all' field to
be the spanish stemmer when you put the mapping

Elasticsearch Platform — Find real-time answers at scale | Elastic

(this is probably not what you want, as the _all field will contain
some fields which shouldn't have the stemmer applied)

you can't determine at mapping time which language you're
going to be using at search time, and you specify the
analyzer in the query_string query itself:

Elasticsearch Platform — Find real-time answers at scale | Elastic

you could do something wizzy per document with the _analyzer field

Elasticsearch Platform — Find real-time answers at scale | Elastic

clint

--
Joaquin Cuenca Abela -- presspeople.com: Fuentes de prensa y comunicados

Enrique_Medina_Monte · March 16, 2011, 1:21pm

Nice catch, Joaquín.

I'll fix it and try to recreate the issue for Clinton.

On Wed, Mar 16, 2011 at 12:52 PM, Joaquin Cuenca Abela <
joaquin@cuencaabela.com> wrote:

I don't know why your pattern search is not working, but on this
analyzer you're ascii folding the terms before you remove the stop
words, and your stop words contain non ascii letters. You should
either ascii fold your stop words or remove them before the ascii
folding step.

Cheers,

On Wed, Mar 16, 2011 at 12:03 PM, Enrique Medina Montenegro
e.medina.m@gmail.com wrote:

Clinton,
Based on some other discussion with Shay, I defined this in the
elasticsearch.yml config file:
index:
analysis:
analyzer:
default:
type: es.cuestamenos.lucene.analizadores.SpanishAnalyzerProvider
And the analyzer is this one:
Custom analyzer for Spanish · GitHub
Shouldn't that be enough both for index and search?
Thanks.

On Wed, Mar 16, 2011 at 11:51 AM, Clinton Gormley <
clinton@iannounce.co.uk>
wrote:

Hi Enrique

I found the issue and it was on my default Spanish analyzer. For some
reason, iPhone gets analyzed in Spanish like this:

http://localhost:9200/mytest/_analyze?text=iPhone+4

{"tokens":[{"token":"iphon","start_offset":0,"end_offset":6,"type":"","position":1},{"token":"4","start_offset":7,"end_offset":8,"type":"","position":2}]}

Presumably you're using the snowball stemmer? It analyzes 'iphone' as
'iphon' to be able to recognise eg "cansada" and "cansado" as the same
stem.

All you need to do is to be sure that you're using the same analyzer at
index time as at search time.

You have a few options here:

you're searching on a field (eg product_name) and you set the
'analyzer' for that field to be the spanish stemmer, when
you put the mapping

Elasticsearch Platform — Find real-time answers at scale | Elastic

you're searching on the '_all' field (which is the default)
and you can set the analyzer for the '_all' field to
be the spanish stemmer when you put the mapping

Elasticsearch Platform — Find real-time answers at scale | Elastic

(this is probably not what you want, as the _all field will contain
some fields which shouldn't have the stemmer applied)

you can't determine at mapping time which language you're
going to be using at search time, and you specify the
analyzer in the query_string query itself:

Elasticsearch Platform — Find real-time answers at scale | Elastic

you could do something wizzy per document with the _analyzer field

Elasticsearch Platform — Find real-time answers at scale | Elastic

clint

--
Joaquin Cuenca Abela -- presspeople.com: Fuentes de prensa y comunicados

Enrique_Medina_Monte · March 16, 2011, 1:38pm

I think I found the issue without having to do a full recreation...

If I search using this:

{
"query": {
"query_string": {
"default_operator": "AND",
"query": "iphone"
}
}
}

then it works as expected, I do get the expected results with the word
"iPhone".

However, if I use:

{
"query": {
"query_string": {
"default_operator": "AND",
"query": "*phone"
}
}
}

then I don't get them. It seems that when you specify a wildcard in the
query, it's not being properly analyzed like it should:

http://localhost:9200/mytest/_analyze?text=*phone

{"tokens":[{"token":"phon","start_offset":0,"end_offset":5,"type":"","position":1}]}

Therefore the wildcard is lost when tokenizing it and the search
doesn't return any results, as "iPhone" doesn't match the token
"phon".

Does this make sense now?

On Wed, Mar 16, 2011 at 12:33 PM, Clinton Gormley
clinton@iannounce.co.ukwrote:

hi Enrique

I've just remembered your original question, which was:

"*phone"
vs
"phone"

As I understand it, the way this wildcard search works is that Lucene
looks up all matching terms, and searches against each of these.

So for some reason, "*phone" doesn't find the the right term, but
"phone" does.

I get consistent results (even the lack of results) when adding a
specific field or specific analyzer:

You mean, you see the same thing?

So I guess it's not a bug, but as explained in my previous email, the
fact that the Spanish analyzer created a token = "iphon" for iPhone so
no matter how I search, it will never match "*phone", right?

No. This should work. For instance, using the default analyzer, if you
index "The Quick BROWN fox" you end up with the terms
"quick","brown","fox"

If you then search for "The Quick BROWN fox", it performs the same
analysis, resulting in the same terms, and searches for those.

So to me (and I'm ignorant of the Lucene internals) it sounds like a
potential bug in the lucene query parser syntax.

A complete recreation would be very useful for debugging.

clint

Enrique_Medina_Monte · March 16, 2011, 1:43pm

Which makes me think, is the '*' actually acting as a wildcard in the query
or is it interpreted by Lucene as just another character that has to be
analyzed, as explained in my previous email, therefore losing all the
wildcard information for the search?

On Wed, Mar 16, 2011 at 2:38 PM, Enrique Medina Montenegro <
e.medina.m@gmail.com> wrote:

I think I found the issue without having to do a full recreation...

If I search using this:

{
"query": {
"query_string": {
"default_operator": "AND",
"query": "iphone"
}
}
}

then it works as expected, I do get the expected results with the word
"iPhone".

However, if I use:

{
"query": {
"query_string": {
"default_operator": "AND",
"query": "*phone"
}
}
}

then I don't get them. It seems that when you specify a wildcard in the
query, it's not being properly analyzed like it should:

http://localhost:9200/mytest/_analyze?text=*phone

{"tokens":[{"token":"phon","start_offset":0,"end_offset":5,"type":"","position":1}]}

Therefore the wildcard is lost when tokenizing it and the search doesn't return any results, as "iPhone" doesn't match the token "phon".

Does this make sense now?

On Wed, Mar 16, 2011 at 12:33 PM, Clinton Gormley <clinton@iannounce.co.uk

wrote:

hi Enrique

I've just remembered your original question, which was:

"*phone"
vs
"phone"

As I understand it, the way this wildcard search works is that Lucene
looks up all matching terms, and searches against each of these.

So for some reason, "*phone" doesn't find the the right term, but
"phone" does.

I get consistent results (even the lack of results) when adding a
specific field or specific analyzer:

You mean, you see the same thing?

So I guess it's not a bug, but as explained in my previous email, the
fact that the Spanish analyzer created a token = "iphon" for iPhone so
no matter how I search, it will never match "*phone", right?

No. This should work. For instance, using the default analyzer, if you
index "The Quick BROWN fox" you end up with the terms
"quick","brown","fox"

If you then search for "The Quick BROWN fox", it performs the same
analysis, resulting in the same terms, and searches for those.

So to me (and I'm ignorant of the Lucene internals) it sounds like a
potential bug in the lucene query parser syntax.

A complete recreation would be very useful for debugging.

clint

Clinton_Gormley · March 16, 2011, 1:47pm

Hi Enrique

On Wed, 2011-03-16 at 14:38 +0100, Enrique Medina Montenegro wrote:

I think I found the issue without having to do a full recreation...

The reason I keep asking for a complete recreation is so that Shay has
got a test case to figure out where the bug is. The easier you make
things for him, the more likely your bug will get attended to.

then I don't get them. It seems that when you specify a wildcard in
the query, it's not being properly analyzed like it should:

yes, i agree

http://localhost:9200/mytest/_analyze?text=*phone

{"tokens":[{"token":"phon","start_offset":0,"end_offset":5,"type":"","position":1}]}

Therefore the wildcard is lost when tokenizing it and the search
doesn't return any results, as "iPhone" doesn't match the token
"phon".

Not quite - the analyze API is just one part of this. What you're not
seeing is the lucene query parser in action. That's where I think the
bug is.

I suggest that you gist a complete recreation and post an issue to

ta

clint

Enrique_Medina_Monte · March 16, 2011, 1:58pm

Yes, I will post the recreation right after this email.

I did some more testing with the wildcard, and it seems that wildcards do
not match blank spaces, so if you specify "iphone" it will not match a
name of "iPhone 4", but something like "iPad/iPhone 4/iPod".

Is this the expected behaviour or am I missing something?

On Wed, Mar 16, 2011 at 2:47 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

Hi Enrique

On Wed, 2011-03-16 at 14:38 +0100, Enrique Medina Montenegro wrote:

I think I found the issue without having to do a full recreation...

The reason I keep asking for a complete recreation is so that Shay has
got a test case to figure out where the bug is. The easier you make
things for him, the more likely your bug will get attended to.

then I don't get them. It seems that when you specify a wildcard in
the query, it's not being properly analyzed like it should:

yes, i agree

http://localhost:9200/mytest/_analyze?text=*phone

{"tokens":[{"token":"phon","start_offset":0,"end_offset":5,"type":"","position":1}]}

Therefore the wildcard is lost when tokenizing it and the search
doesn't return any results, as "iPhone" doesn't match the token
"phon".

Not quite - the analyze API is just one part of this. What you're not
seeing is the lucene query parser in action. That's where I think the
bug is.

I suggest that you gist a complete recreation and post an issue to
Issues · elastic/elasticsearch · GitHub

ta

clint

Clinton_Gormley · March 16, 2011, 2:11pm

On Wed, 2011-03-16 at 14:58 +0100, Enrique Medina Montenegro wrote:

Yes, I will post the recreation right after this email.

thanks

I did some more testing with the wildcard, and it seems that wildcards
do not match blank spaces, so if you specify "iphone" it will not
match a name of "iPhone 4", but something like "iPad/iPhone 4/iPod".

Is this the expected behaviour or am I missing something?

This is correct - so I was wrong is saying that * is equivalent to % in
SQL. It works only on a per-word basis.

Also searching for '"ipho*"' (ie in double quotes) would not work, as
the * would be interpreted literally, rather than as a wildcard.

clint

Enrique_Medina_Monte · March 16, 2011, 2:26pm

Then it's definitively clear that it's not a bug, but a side effect of my
Spanish analyzer tokenizing "iPhone" as "iphon", therefore not matching
"phone" (token is different) or "*phone" (wildcard takes word as a term, not
its token).

I wonder if there's any sort of query in Lucene that acts as a '%' SQL
wildcard, so for instance, when I specify "*phone", instead of matching
tokens with that literal "phone" and something before, it could first
tokenize the literal, i.e. "phon", and then perform the search, which would
definitively match my "iPhone"...

Maybe the solution is to tokenize the words entered by the user before
applying the wildcard, and then passing the tokenized version to the query
eventually...

On Wed, Mar 16, 2011 at 3:11 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

On Wed, 2011-03-16 at 14:58 +0100, Enrique Medina Montenegro wrote:

Yes, I will post the recreation right after this email.

thanks

I did some more testing with the wildcard, and it seems that wildcards
do not match blank spaces, so if you specify "iphone" it will not
match a name of "iPhone 4", but something like "iPad/iPhone 4/iPod".

Is this the expected behaviour or am I missing something?

This is correct - so I was wrong is saying that * is equivalent to % in
SQL. It works only on a per-word basis.

Also searching for '"ipho*"' (ie in double quotes) would not work, as
the * would be interpreted literally, rather than as a wildcard.

clint

Clinton_Gormley · March 16, 2011, 3:07pm

On Wed, 2011-03-16 at 15:26 +0100, Enrique Medina Montenegro wrote:

Then it's definitively clear that it's not a bug, but a side effect of
my Spanish analyzer tokenizing "iPhone" as "iphon", therefore not
matching "phone" (token is different) or "*phone" (wildcard takes word
as a term, not its token).

OK, I'm tired of trying to convince you that this is a bug. So i've
opened the issue for you, with a recreation:

github.com/elastic/elasticsearch

WIldcard not working with snowball stemmer

opened 03:06PM - 16 Mar 11 UTC

closed 07:23PM - 04 Apr 13 UTC

clintongormley

Hiya A query string search with a wildcard on a field that has passed through t…he snowball stemmer is not working correctly. Text indexed: `I have an iPhone` Search for: `iphone` works, but search for `iphone*` doesn't ``` # [Wed Mar 16 16:02:34 2011] Protocol: http, Server: 192.168.5.103:9200 curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d ' { "mappings" : { "doc" : { "properties" : { "text" : { "type" : "string", "analyzer" : "spanish" } } } }, "settings" : { "analysis" : { "analyzer" : { "spanish" : { "language" : "Spanish", "type" : "snowball" } } } } } ' # [Wed Mar 16 16:02:34 2011] Response: # { # "ok" : true, # "acknowledged" : true # } # [Wed Mar 16 16:02:39 2011] Protocol: http, Server: 192.168.5.103:9200 curl -XPOST 'http://127.0.0.1:9200/test/doc?pretty=1' -d ' { "text" : "I have an iPhone" } ' # [Wed Mar 16 16:02:39 2011] Response: # { # "ok" : true, # "_index" : "test", # "_id" : "imN8_G5rTwGwuESyTEz8pg", # "_type" : "doc", # "_version" : 1 # } # [Wed Mar 16 16:02:45 2011] Protocol: http, Server: 192.168.5.103:9200 curl -XGET 'http://127.0.0.1:9200/test/doc/_search?pretty=1' -d ' { "query" : { "field" : { "text" : "iphone" } } } ' # [Wed Mar 16 16:02:45 2011] Response: # { # "hits" : { # "hits" : [ # { # "_source" : { # "text" : "I have an iPhone" # }, # "_score" : 0.15342641, # "_index" : "test", # "_id" : "imN8_G5rTwGwuESyTEz8pg", # "_type" : "doc" # } # ], # "max_score" : 0.15342641, # "total" : 1 # }, # "timed_out" : false, # "_shards" : { # "failed" : 0, # "successful" : 5, # "total" : 5 # }, # "took" : 2 # } # [Wed Mar 16 16:02:48 2011] Protocol: http, Server: 192.168.5.103:9200 curl -XGET 'http://127.0.0.1:9200/test/doc/_search?pretty=1' -d ' { "query" : { "field" : { "text" : "iphone*" } } } ' # [Wed Mar 16 16:02:48 2011] Response: # { # "hits" : { # "hits" : [], # "max_score" : null, # "total" : 0 # }, # "timed_out" : false, # "_shards" : { # "failed" : 0, # "successful" : 5, # "total" : 5 # }, # "took" : 1 # } ```

clint

Enrique_Medina_Monte · March 16, 2011, 3:19pm

I was already working on the recreation, but if you already did, it's done.

All my confusion is about the expected behaviour of the wildcard queries. So
let's say, if my user wants to search for "iPhone" and I run a query with
wildcards "iPhone" then:

As it is working now, it will not analyze the search term, but just use
iPhone as the token itself, therefore not finding "iPhone" which has a token
of "iphon".
As I expected it to be, i.e. the "iPhone" is analyzed into "iphon"
and then executed the search, and "iPhone" results are returned.

So if current behaviour is 1), it's not a bug, but just a misunderstanding
on my side. If current behaviour should be 2), then there's definitively a
bug.

Looking forward to Shay's feedback on it.

On Wed, Mar 16, 2011 at 4:07 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

On Wed, 2011-03-16 at 15:26 +0100, Enrique Medina Montenegro wrote:

Then it's definitively clear that it's not a bug, but a side effect of
my Spanish analyzer tokenizing "iPhone" as "iphon", therefore not
matching "phone" (token is different) or "*phone" (wildcard takes word
as a term, not its token).

OK, I'm tired of trying to convince you that this is a bug. So i've
opened the issue for you, with a recreation:

WIldcard not working with snowball stemmer · Issue #784 · elastic/elasticsearch · GitHub

clint

Topic		Replies	Views
Problem with query_string and default_operator query Elasticsearch	1	335	July 6, 2017
Wildcards in exact phrase in query_string search Elasticsearch	5	748	July 6, 2017
Query_string with "+" operator Elasticsearch	2	343	December 16, 2019
Elasticsearch “AND in query_string” vs. “default_operator AND” Elasticsearch	1	352	October 14, 2019
Wildcards search in exact phrase in query_string search Elasticsearch	3	6262	July 6, 2017

Confused about query_string and the use of wildcards

Related topics