I think I found the issue without having to do a full recreation...
If I search using this:
{
"query": {
"query_string": {
"default_operator": "AND",
"query": "iphone"
}
}
}
then it works as expected, I do get the expected results with the word
"iPhone".
However, if I use:
{
"query": {
"query_string": {
"default_operator": "AND",
"query": "*phone"
}
}
}
then I don't get them. It seems that when you specify a wildcard in the
query, it's not being properly analyzed like it should:
http://localhost:9200/mytest/_analyze?text=*phone
{"tokens":[{"token":"phon","start_offset":0,"end_offset":5,"type":"","position":1}]}
Therefore the wildcard is lost when tokenizing it and the search
doesn't return any results, as "iPhone" doesn't match the token
"phon".
Does this make sense now?
On Wed, Mar 16, 2011 at 12:33 PM, Clinton Gormley
clinton@iannounce.co.ukwrote:
hi Enrique
I've just remembered your original question, which was:
"*phone"
vs
"phone"
As I understand it, the way this wildcard search works is that Lucene
looks up all matching terms, and searches against each of these.
So for some reason, "*phone" doesn't find the the right term, but
"phone" does.
I get consistent results (even the lack of results) when adding a
specific field or specific analyzer:
You mean, you see the same thing?
So I guess it's not a bug, but as explained in my previous email, the
fact that the Spanish analyzer created a token = "iphon" for iPhone so
no matter how I search, it will never match "*phone", right?
No. This should work. For instance, using the default analyzer, if you
index "The Quick BROWN fox" you end up with the terms
"quick","brown","fox"
If you then search for "The Quick BROWN fox", it performs the same
analysis, resulting in the same terms, and searches for those.
So to me (and I'm ignorant of the Lucene internals) it sounds like a
potential bug in the lucene query parser syntax.
A complete recreation would be very useful for debugging.
clint