Regexp in query_string not returning a matching document


(Robin Boutros) #1

I'm searching with this simple query (autocompletion feature)

{ query { string "title:#{keywords}*" } }

Let's say I have a document with the title: Eliminate texting while driving.

  • If keywords is elimin then the document is returned.
  • If keywords is elimina then it's not!

What am I missing? Is there some kind of "max distance" for *?

Thanks!

ps: you can also answer on stackoverflow if you
want: http://stackoverflow.com/questions/10354002/elasticsearch-tire-regexp-in-query-string-not-returning-a-matching-document


(Robin Boutros) #2

Forgot to say I was using Tire. It would translate to something like:

query_string: {
default_field: title
query: elimin*
}

But using Elasticsearch Head, I actually noticed that the "*" is useless.
"elimin" will match Eliminate texting while driving but "elimina" won't.
Could you tell me why?

On Friday, April 27, 2012 12:35:27 PM UTC-4, Robin Boutros wrote:

I'm searching with this simple query (autocompletion feature)

{ query { string "title:#{keywords}*" } }

Let's say I have a document with the title: Eliminate texting while
driving.

  • If keywords is elimin then the document is returned.
  • If keywords is elimina then it's not!

What am I missing? Is there some kind of "max distance" for *?

Thanks!

ps: you can also answer on stackoverflow if you want:
http://stackoverflow.com/questions/10354002/elasticsearch-tire-regexp-in-query-string-not-returning-a-matching-document

On Friday, April 27, 2012 12:35:27 PM UTC-4, Robin Boutros wrote:

I'm searching with this simple query (autocompletion feature)

{ query { string "title:#{keywords}*" } }

Let's say I have a document with the title: Eliminate texting while
driving.

  • If keywords is elimin then the document is returned.
  • If keywords is elimina then it's not!

What am I missing? Is there some kind of "max distance" for *?

Thanks!

ps: you can also answer on stackoverflow if you want:
http://stackoverflow.com/questions/10354002/elasticsearch-tire-regexp-in-query-string-not-returning-a-matching-document


(Robin Boutros) #3

Oh... I get it, it's because I'm using the snowball analyzer... So it comes
back to my original question of why the * doesnt work in my case...

On Friday, April 27, 2012 12:35:27 PM UTC-4, Robin Boutros wrote:

I'm searching with this simple query (autocompletion feature)

{ query { string "title:#{keywords}*" } }

Let's say I have a document with the title: Eliminate texting while
driving.

  • If keywords is elimin then the document is returned.
  • If keywords is elimina then it's not!

What am I missing? Is there some kind of "max distance" for *?

Thanks!

ps: you can also answer on stackoverflow if you want:
http://stackoverflow.com/questions/10354002/elasticsearch-tire-regexp-in-query-string-not-returning-a-matching-document


(Igor Motov) #4

I think you already answered your question - it's because you are using
snowball analyzer. The snowball analyzer converts Eliminate into term elimin

$ curl "localhost:9200/myindex/_analyze?text=Eliminate&analyzer=snowball"
{"tokens":[{"token":"elimin","start_offset":0,"end_offset":9,"type":"","position":1}]}

The query elimina* searches for terms that start with elimina, and there
are no such terms in the index. By default, wildcard terms are not analyzed

  • they are only converted to lower case.

On Friday, April 27, 2012 1:02:51 PM UTC-4, Robin Boutros wrote:

Oh... I get it, it's because I'm using the snowball analyzer... So it
comes back to my original question of why the * doesnt work in my case...

On Friday, April 27, 2012 12:35:27 PM UTC-4, Robin Boutros wrote:

I'm searching with this simple query (autocompletion feature)

{ query { string "title:#{keywords}*" } }

Let's say I have a document with the title: Eliminate texting while
driving.

  • If keywords is elimin then the document is returned.
  • If keywords is elimina then it's not!

What am I missing? Is there some kind of "max distance" for *?

Thanks!

ps: you can also answer on stackoverflow if you want:
http://stackoverflow.com/questions/10354002/elasticsearch-tire-regexp-in-query-string-not-returning-a-matching-document


(Robin Boutros) #5

Indeed... I changed it to the standard analyzer and it's fine.

Thanks :slight_smile:

On Friday, April 27, 2012 11:37:35 PM UTC-4, Igor Motov wrote:

I think you already answered your question - it's because you are using
snowball analyzer. The snowball analyzer converts Eliminate into term elimin

$ curl "localhost:9200/myindex/_analyze?text=Eliminate&analyzer=snowball"

{"tokens":[{"token":"elimin","start_offset":0,"end_offset":9,"type":"","position":1}]}

The query elimina* searches for terms that start with elimina, and there
are no such terms in the index. By default, wildcard terms are not analyzed

  • they are only converted to lower case.

On Friday, April 27, 2012 1:02:51 PM UTC-4, Robin Boutros wrote:

Oh... I get it, it's because I'm using the snowball analyzer... So it
comes back to my original question of why the * doesnt work in my case...

On Friday, April 27, 2012 12:35:27 PM UTC-4, Robin Boutros wrote:

I'm searching with this simple query (autocompletion feature)

{ query { string "title:#{keywords}*" } }

Let's say I have a document with the title: Eliminate texting while
driving.

  • If keywords is elimin then the document is returned.
  • If keywords is elimina then it's not!

What am I missing? Is there some kind of "max distance" for *?

Thanks!

ps: you can also answer on stackoverflow if you want:
http://stackoverflow.com/questions/10354002/elasticsearch-tire-regexp-in-query-string-not-returning-a-matching-document


(Shay Banon) #6

Note, to do autocomplete, its is not recommended to use wildcard (or
prefix) query, you can index the data using with an analyzer based on edge
ngram (and use standard analyzer for search), which will provide better /
faster results (though a bigger index).

On Sat, Apr 28, 2012 at 6:56 AM, Robin Boutros niuage@gmail.com wrote:

Indeed... I changed it to the standard analyzer and it's fine.

Thanks :slight_smile:

On Friday, April 27, 2012 11:37:35 PM UTC-4, Igor Motov wrote:

I think you already answered your question - it's because you are using
snowball analyzer. The snowball analyzer converts Eliminate into term elimin

$ curl "localhost:9200/myindex/_analyze?text=Eliminate&
analyzer=snowball"
{"tokens":[{"token":"elimin","start_offset":0,"end_offset":
9,"type":"","**position":1}]}

The query elimina* searches for terms that start with elimina, and there
are no such terms in the index. By default, wildcard terms are not analyzed

  • they are only converted to lower case.

On Friday, April 27, 2012 1:02:51 PM UTC-4, Robin Boutros wrote:

Oh... I get it, it's because I'm using the snowball analyzer... So it
comes back to my original question of why the * doesnt work in my case...

On Friday, April 27, 2012 12:35:27 PM UTC-4, Robin Boutros wrote:

I'm searching with this simple query (autocompletion feature)

{ query { string "title:#{keywords}*" } }

Let's say I have a document with the title: Eliminate texting while
driving.

  • If keywords is elimin then the document is returned.
  • If keywords is elimina then it's not!

What am I missing? Is there some kind of "max distance" for *?

Thanks!

ps: you can also answer on stackoverflow if you want:
http://stackoverflow.com/questions/10354002/
elasticsearch-tire-regexp-in-query-string-not-returning-a-
matching-documenthttp://stackoverflow.com/questions/10354002/elasticsearch-tire-regexp-in-query-string-not-returning-a-matching-document


(Robin Boutros) #7

That's exactly what I ended up with kimchy!

This stackoverflow answer was pretty helpful, just in case someone is
interested in doing that as
well: http://stackoverflow.com/questions/9421358/filename-search-with-elasticsearch

Thanks


(system) #8