Query by keywords and phrase. Performance question

Victor_Soloh · February 18, 2013, 7:12pm

Hello,

I am relatively new to Elasticsearch. I do understand the basic concepts
but have trouble with more advanced queries.
I am trying to build a query which searches by several keywords on 2
fields: 'title' and 'description' where matches for 'title' are boosted.
Then I need to boost results which have phrase matches so that they appear
first after sorting. (Also, 'title' and 'description' fields are
multi-fields).

For instance, if searching for "hard drive enclosure" I would find a number
of records which have these words in the title and description. But then
the question is how to boost documents which contain phrases "hard drive
enclosure", "hard drive" or "drive enclosure".

The following query works well for the first part of the requirement:

{"size":"40","sort":{"_score":""},"query":
{ "multi_match" : {
"query" : "hard drive enclosure",
"fields" : [ "title^2", "description" ]
}
}}

But how to boost phrase matches still remains up to debate. I believe, I
should be able to construct a boolean 'should' query which combines the
existing query with 'match_phrase' query. However, it seems like this may
create performance issue (now more than one query has to run simultaneously
while it's just a matter of going through result set and manipulating boost
value).

Another way of accomplishing this would be using custom boost with
scripting. I looked at this but can't figure out for the life of me how to
find a substring of a string with MVEL. Started looking at Javascript
plugin for this.

So, what would be the recommended way of creating this query?
All suggestions are appreciated.
Vic

Relevant parts of the mapping are as follows:

{
"number_of_shards" : 1,
"index" : {
"analysis":{
"filter":{
"name_ngrams":{
"side":"front",
"max_gram":8,
"min_gram":3,
"type":"edgeNGram"
}
},
"analyzer":{
"full_name":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
},
"partial_name":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"name_ngrams"
],
"type":"custom",
"tokenizer":"standard"
}
}
}
}
};

     "title": { "fields":{
              "title":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "title_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },
      "description": { "fields":{
              "description":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "description_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

egaumer · February 18, 2013, 8:40pm

What you probably want to do is use a DismaxQuery. The query below will
match the users query (keyword) against the title and content fields but
will boost phrases. You can adjust the slop to match exact phrases (this
example will boost fuzzy phrase matches). I used elastic.js because it's
terse compared to the JSON syntax but you can convert this to JSON here

The domain name Fullscale.co is for sale | Dan.com

ejs.Request()
.query(ejs.DisMaxQuery()
.queries(ejs.MatchQuery('title', 'keyword'))
.queries(ejs.MatchQuery('content', 'keyword'))
.queries(ejs.MatchQuery('title', 'keyword')
.boost(2)
.type('phrase')
.slop(5))
.queries(ejs.MatchQuery('content', 'keyword')
.boost(2)
.type('phrase')
.slop(20))
.tieBreaker(0.1))

On Monday, February 18, 2013 2:12:40 PM UTC-5, Victor Soloh wrote:

Hello,

I am relatively new to Elasticsearch. I do understand the basic concepts
but have trouble with more advanced queries.
I am trying to build a query which searches by several keywords on 2
fields: 'title' and 'description' where matches for 'title' are boosted.
Then I need to boost results which have phrase matches so that they appear
first after sorting. (Also, 'title' and 'description' fields are
multi-fields).

For instance, if searching for "hard drive enclosure" I would find a
number of records which have these words in the title and description. But
then the question is how to boost documents which contain phrases "hard
drive enclosure", "hard drive" or "drive enclosure".

The following query works well for the first part of the requirement:

{"size":"40","sort":{"_score":""},"query":
{ "multi_match" : {
"query" : "hard drive enclosure",
"fields" : [ "title^2", "description" ]
}
}}

But how to boost phrase matches still remains up to debate. I believe, I
should be able to construct a boolean 'should' query which combines the
existing query with 'match_phrase' query. However, it seems like this may
create performance issue (now more than one query has to run simultaneously
while it's just a matter of going through result set and manipulating boost
value).

Another way of accomplishing this would be using custom boost with
scripting. I looked at this but can't figure out for the life of me how to
find a substring of a string with MVEL. Started looking at Javascript
plugin for this.

So, what would be the recommended way of creating this query?
All suggestions are appreciated.
Vic

Relevant parts of the mapping are as follows:

{
"number_of_shards" : 1,
"index" : {
"analysis":{
"filter":{
"name_ngrams":{
"side":"front",
"max_gram":8,
"min_gram":3,
"type":"edgeNGram"
}
},
"analyzer":{
"full_name":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
},
"partial_name":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"name_ngrams"
],
"type":"custom",
"tokenizer":"standard"
}
}
}
}
};
     "title": { "fields":{
              "title":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "title_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },
      "description": { "fields":{
              "description":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "description_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Victor_Soloh · February 19, 2013, 12:05am

Egaumer,
Thanks for response! The Dismax query you formulated works pretty well.
There are some weird cases when a document gets lower score even though it
has a matching phrase in the title, will have to investigate. But generally
works well.

Now, how a poor developer can come up with a query like this (including
parameters, etc.)? Are you able to do it because you have plenty of ES
experience and have done something similar in the past or is there an
online tutorial somewhere (which I wasn't able to find, despite plenty of
trying)?

Thanks!
Vic

On Monday, February 18, 2013 3:40:55 PM UTC-5, egaumer wrote:

What you probably want to do is use a DismaxQuery. The query below will
match the users query (keyword) against the title and content fields but
will boost phrases. You can adjust the slop to match exact phrases (this
example will boost fuzzy phrase matches). I used elastic.js because it's
terse compared to the JSON syntax but you can convert this to JSON here -
The domain name Fullscale.co is for sale | Dan.com

ejs.Request()
.query(ejs.DisMaxQuery()
.queries(ejs.MatchQuery('title', 'keyword'))
.queries(ejs.MatchQuery('content', 'keyword'))
.queries(ejs.MatchQuery('title', 'keyword')
.boost(2)
.type('phrase')
.slop(5))
.queries(ejs.MatchQuery('content', 'keyword')
.boost(2)
.type('phrase')
.slop(20))
.tieBreaker(0.1))

On Monday, February 18, 2013 2:12:40 PM UTC-5, Victor Soloh wrote:
Hello,

I am relatively new to Elasticsearch. I do understand the basic concepts
but have trouble with more advanced queries.
I am trying to build a query which searches by several keywords on 2
fields: 'title' and 'description' where matches for 'title' are boosted.
Then I need to boost results which have phrase matches so that they appear
first after sorting. (Also, 'title' and 'description' fields are
multi-fields).

For instance, if searching for "hard drive enclosure" I would find a
number of records which have these words in the title and description. But
then the question is how to boost documents which contain phrases "hard
drive enclosure", "hard drive" or "drive enclosure".

The following query works well for the first part of the requirement:

{"size":"40","sort":{"_score":""},"query":
{ "multi_match" : {
"query" : "hard drive enclosure",
"fields" : [ "title^2", "description" ]
}
}}

But how to boost phrase matches still remains up to debate. I believe, I
should be able to construct a boolean 'should' query which combines the
existing query with 'match_phrase' query. However, it seems like this may
create performance issue (now more than one query has to run simultaneously
while it's just a matter of going through result set and manipulating boost
value).

Another way of accomplishing this would be using custom boost with
scripting. I looked at this but can't figure out for the life of me how to
find a substring of a string with MVEL. Started looking at Javascript
plugin for this.

So, what would be the recommended way of creating this query?
All suggestions are appreciated.
Vic

Relevant parts of the mapping are as follows:

{
"number_of_shards" : 1,
"index" : {
"analysis":{
"filter":{
"name_ngrams":{
"side":"front",
"max_gram":8,
"min_gram":3,
"type":"edgeNGram"
}
},
"analyzer":{
"full_name":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
},
"partial_name":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"name_ngrams"
],
"type":"custom",
"tokenizer":"standard"
}
}
}
}
};
     "title": { "fields":{
              "title":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "title_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },
      "description": { "fields":{
              "description":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "description_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

egaumer · February 19, 2013, 1:14am

Hehe... like most things, it comes with experience. You pick things up
along the way and I've been doing this (search) for quite a long time now.

The mailing list is a good place to ask for help though (good collective
knowledge). You could also attend one of the elasticsearch trainings which
can expedite the learning process.

In terms of tutorials, you're starting to see more thanks to the popularity
of elasticsearch but search itself has really been a black art over the
past decade. It's been one of those technologies that's been mostly limited
to the larger Fortune 500 companies. Thanks to elasticsearch (and Lucene)
that's changing and more and more developers are starting
to familiarize themselves with search because it's become more accessible.

On Monday, February 18, 2013 7:05:13 PM UTC-5, Victor Soloh wrote:

Egaumer,
Thanks for response! The Dismax query you formulated works pretty well.
There are some weird cases when a document gets lower score even though it
has a matching phrase in the title, will have to investigate. But generally
works well.

Now, how a poor developer can come up with a query like this (including
parameters, etc.)? Are you able to do it because you have plenty of ES
experience and have done something similar in the past or is there an
online tutorial somewhere (which I wasn't able to find, despite plenty of
trying)?

Thanks!
Vic

On Monday, February 18, 2013 3:40:55 PM UTC-5, egaumer wrote:
What you probably want to do is use a DismaxQuery. The query below will
match the users query (keyword) against the title and content fields but
will boost phrases. You can adjust the slop to match exact phrases (this
example will boost fuzzy phrase matches). I used elastic.js because it's
terse compared to the JSON syntax but you can convert this to JSON here -
The domain name Fullscale.co is for sale | Dan.com

ejs.Request()
.query(ejs.DisMaxQuery()
.queries(ejs.MatchQuery('title', 'keyword'))
.queries(ejs.MatchQuery('content', 'keyword'))
.queries(ejs.MatchQuery('title', 'keyword')
.boost(2)
.type('phrase')
.slop(5))
.queries(ejs.MatchQuery('content', 'keyword')
.boost(2)
.type('phrase')
.slop(20))
.tieBreaker(0.1))

On Monday, February 18, 2013 2:12:40 PM UTC-5, Victor Soloh wrote:
Hello,

I am relatively new to Elasticsearch. I do understand the basic concepts
but have trouble with more advanced queries.
I am trying to build a query which searches by several keywords on 2
fields: 'title' and 'description' where matches for 'title' are boosted.
Then I need to boost results which have phrase matches so that they appear
first after sorting. (Also, 'title' and 'description' fields are
multi-fields).

For instance, if searching for "hard drive enclosure" I would find a
number of records which have these words in the title and description. But
then the question is how to boost documents which contain phrases "hard
drive enclosure", "hard drive" or "drive enclosure".

The following query works well for the first part of the requirement:

{"size":"40","sort":{"_score":""},"query":
{ "multi_match" : {
"query" : "hard drive enclosure",
"fields" : [ "title^2", "description" ]
}
}}

But how to boost phrase matches still remains up to debate. I believe, I
should be able to construct a boolean 'should' query which combines the
existing query with 'match_phrase' query. However, it seems like this may
create performance issue (now more than one query has to run simultaneously
while it's just a matter of going through result set and manipulating boost
value).

Another way of accomplishing this would be using custom boost with
scripting. I looked at this but can't figure out for the life of me how to
find a substring of a string with MVEL. Started looking at Javascript
plugin for this.

So, what would be the recommended way of creating this query?
All suggestions are appreciated.
Vic

Relevant parts of the mapping are as follows:

{
"number_of_shards" : 1,
"index" : {
"analysis":{
"filter":{
"name_ngrams":{
"side":"front",
"max_gram":8,
"min_gram":3,
"type":"edgeNGram"
}
},
"analyzer":{
"full_name":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
},
"partial_name":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"name_ngrams"
],
"type":"custom",
"tokenizer":"standard"
}
}
}
}
};
     "title": { "fields":{
              "title":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "title_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },
      "description": { "fields":{
              "description":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "description_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
[Control Relevance] How to boost doc contains specific keywords? Elasticsearch	3	2204	July 5, 2017
Boosting certain words/phrases in a should query? Elasticsearch	1	478	June 19, 2020
Phrase boosting in multi-match query Elasticsearch	2	3997	July 5, 2017
MultiMatch with phrase_prefix support for boost? Elasticsearch	7	447	July 6, 2017
Need some help to get started Elasticsearch	1	313	July 6, 2017

Query by keywords and phrase. Performance question

Related topics