Query building question


(maik) #1

Hi together,

I'm using elasticsearch quite for a while, but I'm not sure if the queries
I build are ok.

I'm indexing products of an ecommerce webshop with only a few fields: name,
manufacturername, code, referencecode.

The visitors of the shop can use the global searchfield to initialize a
search.
Let's say one choose "usb stick" for the search string. My current query is
build as followed:

{

"query": {

"bool": {

"should": [

{

"fuzzy_like_this_field": {

"name": {

"like_text": "usb stick",

"boost": 10.0,

"min_similarity": 0.7

}

}

},

{

"fuzzy_like_this_field": {

"manufacturer": {

"like_text": "usb stick",

"boost": 5.0,

"min_similarity": 0.7

}

}

},

{

"fuzzy_like_this_field": {

"code": {

"like_text": "usb stick",

"boost": 1.0,

"min_similarity": 0.7

}

}

},

{

"fuzzy_like_this_field": {

"refcode": {

"like_text": "usb stick",

"boost": 10,

"min_similarity": 0.7

}

}

}

]

}

}

}

I'm using the field based fuzzy queries to place a "boost" per field.
With it, if I'm right, I can for example specify "the name field is more
important than the code field".

The results are ok. Nothing to grumble about. But this request tooks
300-500ms to execute.

So my final question: Is it possible to rewrite the query, so it has the
same effect (field priority + fuzzy), but faster?

Index size: 50k documents
2 shards, 1 replica each
2 servers, more than enough memory / cpu.

Thank you in advance
Greetings
Maik


(Shay Banon) #2

Fuzzy queries are sadly slow... . One option to work around that is for
example to use ngrams to analyze the fields during indexing time. I would
use multi field mapping, one with regular analyzer, and one with ngrams,
and then do the search on the field with the regular analyzer with a high
boost, and one on the ngram based one with a lower boost.

On Thu, May 31, 2012 at 2:47 PM, maik maik2102@googlemail.com wrote:

Hi together,

I'm using elasticsearch quite for a while, but I'm not sure if the queries
I build are ok.

I'm indexing products of an ecommerce webshop with only a few fields:
name, manufacturername, code, referencecode.

The visitors of the shop can use the global searchfield to initialize a
search.
Let's say one choose "usb stick" for the search string. My current query
is build as followed:

{

"query": {

"bool": {

"should": [

{

"fuzzy_like_this_field": {

"name": {

"like_text": "usb stick",

"boost": 10.0,

"min_similarity": 0.7

}

}

},

{

"fuzzy_like_this_field": {

"manufacturer": {

"like_text": "usb stick",

"boost": 5.0,

"min_similarity": 0.7

}

}

},

{

"fuzzy_like_this_field": {

"code": {

"like_text": "usb stick",

"boost": 1.0,

"min_similarity": 0.7

}

}

},

{

"fuzzy_like_this_field": {

"refcode": {

"like_text": "usb stick",

"boost": 10,

"min_similarity": 0.7

}

}

}

]

}

}

}

I'm using the field based fuzzy queries to place a "boost" per field.
With it, if I'm right, I can for example specify "the name field is more
important than the code field".

The results are ok. Nothing to grumble about. But this request tooks
300-500ms to execute.

So my final question: Is it possible to rewrite the query, so it has the
same effect (field priority + fuzzy), but faster?

Index size: 50k documents
2 shards, 1 replica each
2 servers, more than enough memory / cpu.

Thank you in advance
Greetings
Maik


(maik) #3

Thx Shay,

I'll give it a try.

Greetings
maik


(system) #4