Query by keywords and phrase. Performance question

Hello,

I am relatively new to Elasticsearch. I do understand the basic concepts
but have trouble with more advanced queries.
I am trying to build a query which searches by several keywords on 2
fields: 'title' and 'description' where matches for 'title' are boosted.
Then I need to boost results which have phrase matches so that they appear
first after sorting. (Also, 'title' and 'description' fields are
multi-fields).

For instance, if searching for "hard drive enclosure" I would find a number
of records which have these words in the title and description. But then
the question is how to boost documents which contain phrases "hard drive
enclosure", "hard drive" or "drive enclosure".

The following query works well for the first part of the requirement:

{"size":"40","sort":{"_score":""},"query":
{ "multi_match" : {
"query" : "hard drive enclosure",
"fields" : [ "title^2", "description" ]
}
}}

But how to boost phrase matches still remains up to debate. I believe, I
should be able to construct a boolean 'should' query which combines the
existing query with 'match_phrase' query. However, it seems like this may
create performance issue (now more than one query has to run simultaneously
while it's just a matter of going through result set and manipulating boost
value).

Another way of accomplishing this would be using custom boost with
scripting. I looked at this but can't figure out for the life of me how to
find a substring of a string with MVEL. Started looking at Javascript
plugin for this.

So, what would be the recommended way of creating this query?
All suggestions are appreciated.
Vic

Relevant parts of the mapping are as follows:

{
"number_of_shards" : 1,
"index" : {
"analysis":{
"filter":{
"name_ngrams":{
"side":"front",
"max_gram":8,
"min_gram":3,
"type":"edgeNGram"
}
},
"analyzer":{
"full_name":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
},
"partial_name":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"name_ngrams"
],
"type":"custom",
"tokenizer":"standard"
}
}
}
}
};

     "title": { "fields":{
              "title":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "title_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },
      "description": { "fields":{
              "description":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "description_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What you probably want to do is use a DismaxQuery. The query below will
match the users query (keyword) against the title and content fields but
will boost phrases. You can adjust the slop to match exact phrases (this
example will boost fuzzy phrase matches). I used elastic.js because it's
terse compared to the JSON syntax but you can convert this to JSON here

ejs.Request()
.query(ejs.DisMaxQuery()
.queries(ejs.MatchQuery('title', 'keyword'))
.queries(ejs.MatchQuery('content', 'keyword'))
.queries(ejs.MatchQuery('title', 'keyword')
.boost(2)
.type('phrase')
.slop(5))
.queries(ejs.MatchQuery('content', 'keyword')
.boost(2)
.type('phrase')
.slop(20))
.tieBreaker(0.1))

On Monday, February 18, 2013 2:12:40 PM UTC-5, Victor Soloh wrote:

Hello,

I am relatively new to Elasticsearch. I do understand the basic concepts
but have trouble with more advanced queries.
I am trying to build a query which searches by several keywords on 2
fields: 'title' and 'description' where matches for 'title' are boosted.
Then I need to boost results which have phrase matches so that they appear
first after sorting. (Also, 'title' and 'description' fields are
multi-fields).

For instance, if searching for "hard drive enclosure" I would find a
number of records which have these words in the title and description. But
then the question is how to boost documents which contain phrases "hard
drive enclosure", "hard drive" or "drive enclosure".

The following query works well for the first part of the requirement:

{"size":"40","sort":{"_score":""},"query":
{ "multi_match" : {
"query" : "hard drive enclosure",
"fields" : [ "title^2", "description" ]
}
}}

But how to boost phrase matches still remains up to debate. I believe, I
should be able to construct a boolean 'should' query which combines the
existing query with 'match_phrase' query. However, it seems like this may
create performance issue (now more than one query has to run simultaneously
while it's just a matter of going through result set and manipulating boost
value).

Another way of accomplishing this would be using custom boost with
scripting. I looked at this but can't figure out for the life of me how to
find a substring of a string with MVEL. Started looking at Javascript
plugin for this.

So, what would be the recommended way of creating this query?
All suggestions are appreciated.
Vic

Relevant parts of the mapping are as follows:

{
"number_of_shards" : 1,
"index" : {
"analysis":{
"filter":{
"name_ngrams":{
"side":"front",
"max_gram":8,
"min_gram":3,
"type":"edgeNGram"
}
},
"analyzer":{
"full_name":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
},
"partial_name":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"name_ngrams"
],
"type":"custom",
"tokenizer":"standard"
}
}
}
}
};

     "title": { "fields":{
              "title":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "title_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },
      "description": { "fields":{
              "description":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "description_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Egaumer,
Thanks for response! The Dismax query you formulated works pretty well.
There are some weird cases when a document gets lower score even though it
has a matching phrase in the title, will have to investigate. But generally
works well.

Now, how a poor developer can come up with a query like this (including
parameters, etc.)? Are you able to do it because you have plenty of ES
experience and have done something similar in the past or is there an
online tutorial somewhere (which I wasn't able to find, despite plenty of
trying)?

Thanks!
Vic

On Monday, February 18, 2013 3:40:55 PM UTC-5, egaumer wrote:

What you probably want to do is use a DismaxQuery. The query below will
match the users query (keyword) against the title and content fields but
will boost phrases. You can adjust the slop to match exact phrases (this
example will boost fuzzy phrase matches). I used elastic.js because it's
terse compared to the JSON syntax but you can convert this to JSON here -
The domain name Fullscale.co is for sale | Dan.com

ejs.Request()
.query(ejs.DisMaxQuery()
.queries(ejs.MatchQuery('title', 'keyword'))
.queries(ejs.MatchQuery('content', 'keyword'))
.queries(ejs.MatchQuery('title', 'keyword')
.boost(2)
.type('phrase')
.slop(5))
.queries(ejs.MatchQuery('content', 'keyword')
.boost(2)
.type('phrase')
.slop(20))
.tieBreaker(0.1))

On Monday, February 18, 2013 2:12:40 PM UTC-5, Victor Soloh wrote:

Hello,

I am relatively new to Elasticsearch. I do understand the basic concepts
but have trouble with more advanced queries.
I am trying to build a query which searches by several keywords on 2
fields: 'title' and 'description' where matches for 'title' are boosted.
Then I need to boost results which have phrase matches so that they appear
first after sorting. (Also, 'title' and 'description' fields are
multi-fields).

For instance, if searching for "hard drive enclosure" I would find a
number of records which have these words in the title and description. But
then the question is how to boost documents which contain phrases "hard
drive enclosure", "hard drive" or "drive enclosure".

The following query works well for the first part of the requirement:

{"size":"40","sort":{"_score":""},"query":
{ "multi_match" : {
"query" : "hard drive enclosure",
"fields" : [ "title^2", "description" ]
}
}}

But how to boost phrase matches still remains up to debate. I believe, I
should be able to construct a boolean 'should' query which combines the
existing query with 'match_phrase' query. However, it seems like this may
create performance issue (now more than one query has to run simultaneously
while it's just a matter of going through result set and manipulating boost
value).

Another way of accomplishing this would be using custom boost with
scripting. I looked at this but can't figure out for the life of me how to
find a substring of a string with MVEL. Started looking at Javascript
plugin for this.

So, what would be the recommended way of creating this query?
All suggestions are appreciated.
Vic

Relevant parts of the mapping are as follows:

{
"number_of_shards" : 1,
"index" : {
"analysis":{
"filter":{
"name_ngrams":{
"side":"front",
"max_gram":8,
"min_gram":3,
"type":"edgeNGram"
}
},
"analyzer":{
"full_name":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
},
"partial_name":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"name_ngrams"
],
"type":"custom",
"tokenizer":"standard"
}
}
}
}
};

     "title": { "fields":{
              "title":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "title_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },
      "description": { "fields":{
              "description":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "description_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hehe... like most things, it comes with experience. You pick things up
along the way and I've been doing this (search) for quite a long time now.

The mailing list is a good place to ask for help though (good collective
knowledge). You could also attend one of the elasticsearch trainings which
can expedite the learning process.

In terms of tutorials, you're starting to see more thanks to the popularity
of elasticsearch but search itself has really been a black art over the
past decade. It's been one of those technologies that's been mostly limited
to the larger Fortune 500 companies. Thanks to elasticsearch (and Lucene)
that's changing and more and more developers are starting
to familiarize themselves with search because it's become more accessible.

On Monday, February 18, 2013 7:05:13 PM UTC-5, Victor Soloh wrote:

Egaumer,
Thanks for response! The Dismax query you formulated works pretty well.
There are some weird cases when a document gets lower score even though it
has a matching phrase in the title, will have to investigate. But generally
works well.

Now, how a poor developer can come up with a query like this (including
parameters, etc.)? Are you able to do it because you have plenty of ES
experience and have done something similar in the past or is there an
online tutorial somewhere (which I wasn't able to find, despite plenty of
trying)?

Thanks!
Vic

On Monday, February 18, 2013 3:40:55 PM UTC-5, egaumer wrote:

What you probably want to do is use a DismaxQuery. The query below will
match the users query (keyword) against the title and content fields but
will boost phrases. You can adjust the slop to match exact phrases (this
example will boost fuzzy phrase matches). I used elastic.js because it's
terse compared to the JSON syntax but you can convert this to JSON here -
The domain name Fullscale.co is for sale | Dan.com

ejs.Request()
.query(ejs.DisMaxQuery()
.queries(ejs.MatchQuery('title', 'keyword'))
.queries(ejs.MatchQuery('content', 'keyword'))
.queries(ejs.MatchQuery('title', 'keyword')
.boost(2)
.type('phrase')
.slop(5))
.queries(ejs.MatchQuery('content', 'keyword')
.boost(2)
.type('phrase')
.slop(20))
.tieBreaker(0.1))

On Monday, February 18, 2013 2:12:40 PM UTC-5, Victor Soloh wrote:

Hello,

I am relatively new to Elasticsearch. I do understand the basic concepts
but have trouble with more advanced queries.
I am trying to build a query which searches by several keywords on 2
fields: 'title' and 'description' where matches for 'title' are boosted.
Then I need to boost results which have phrase matches so that they appear
first after sorting. (Also, 'title' and 'description' fields are
multi-fields).

For instance, if searching for "hard drive enclosure" I would find a
number of records which have these words in the title and description. But
then the question is how to boost documents which contain phrases "hard
drive enclosure", "hard drive" or "drive enclosure".

The following query works well for the first part of the requirement:

{"size":"40","sort":{"_score":""},"query":
{ "multi_match" : {
"query" : "hard drive enclosure",
"fields" : [ "title^2", "description" ]
}
}}

But how to boost phrase matches still remains up to debate. I believe, I
should be able to construct a boolean 'should' query which combines the
existing query with 'match_phrase' query. However, it seems like this may
create performance issue (now more than one query has to run simultaneously
while it's just a matter of going through result set and manipulating boost
value).

Another way of accomplishing this would be using custom boost with
scripting. I looked at this but can't figure out for the life of me how to
find a substring of a string with MVEL. Started looking at Javascript
plugin for this.

So, what would be the recommended way of creating this query?
All suggestions are appreciated.
Vic

Relevant parts of the mapping are as follows:

{
"number_of_shards" : 1,
"index" : {
"analysis":{
"filter":{
"name_ngrams":{
"side":"front",
"max_gram":8,
"min_gram":3,
"type":"edgeNGram"
}
},
"analyzer":{
"full_name":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
},
"partial_name":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"name_ngrams"
],
"type":"custom",
"tokenizer":"standard"
}
}
}
}
};

     "title": { "fields":{
              "title":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "title_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },
      "description": { "fields":{
              "description":{
                 "type":"string",
                 "analyzer":"full_name"
              },
              "description_partial":{
                 "search_analyzer":"full_name",
                 "index_analyzer":"partial_name",
                 "type":"string"
              }
           },
           "type":"multi_field"
      },

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.