Constant boost for nested query


(Chris Yuska) #1

I have a Person object indexed with nested Work records. I want to search
for the Work "title" and give an extra boost to Work records that match on
title but are also still "current." My current query (shortened for the
sake of this thread) looks like this:

query: {
bool: {
should: [
{ term: { "works.title" : query } },
{ term: { "works.current" : "true" } }
]
}
}

There are some "must" terms in there, but the problem is that the
works.current should only be applied to the works that are matches. In
other words, a Person with 5 Works that are current but only 1 that matches
the title query should score equally with a Person having 1 Work that is
both a title match and current. Additionally, a Person with 1 Work that is
both a title match and is current should score higher than a Person with 1
Work that is only a title match (not current). Right now, the Person with
more "current" matches scores higher, regardless of number of title matches
or whether the current Works are the same Works that are title matches.

I'm lost as to how to enforce this score boosting only when the same nested
object is already matched on. Any help?

Thanks.

--


(phill) #2

On 9/12/2012 7:44 PM, Chris Yuska wrote:

I have a Person object indexed with nested Work records. I want to
search for the Work "title" and give an extra boost to Work records
that match on title but are also still "current." My current query
(shortened for the sake of this thread) looks like this:

|
query:{
bool:{
should:[
{term:{"works.title":query }},
{term:{"works.current":"true"}}
]
}
}
|
[...]
Right now, the Person with more "current" matches scores higher,
regardless of number of title matches or whether the current Works are
the same Works that are title matches.

I don't understand your whole problem, but a few things come to mind,
the idea of turning off coordination within clauses of your bool query
is one.
Another is to consider moving the checking for matches to a filter, so
that it doesn't effect the score.
Try to also learn something about search explain, but that is hard since
it's a bit cryptic particularly since it is at the Lucene level which
might not correspond exactly to the ES query.

More fundamentally don't you want "must" for works.current? Should by
itself is "one of these should match", and should with other musts is "0
or more of these should match".
Right now you are getting title match OR current matches aren't you?

-Paul

--


(Chris Yuska) #3

Thank you for the response. I probably should have disclosed more
information about the nature of this query. This is actually a general
query across all fields via a search box. Each Person has multiple other
fields (name, location, etc.), as well as other nested objects. Currently,
I just have all the fields we're trying to match in the bool query, along
with the option minimum_number_should_match: 1, which would incorrectly
find works.current as a match, so I currently have the works.current term
disabled.

After your comment and some more research, if I understand correctly, I
should move all the fields I'm looking for a match on in this basic query
to a bool filter (
http://www.elasticsearch.org/guide/reference/query-dsl/bool-filter.html)?
Then in the query, look for a match where works.current : true? The only
issue I see there is keeping some sort of sane scoring that isn't totally
dependent on works.current being true.

Thanks again.

On Friday, September 14, 2012 12:27:35 PM UTC-4, P Hill wrote:

I don't understand your whole problem, but a few things come to mind,
the idea of turning off coordination within clauses of your bool query
is one.
Another is to consider moving the checking for matches to a filter, so
that it doesn't effect the score.
Try to also learn something about search explain, but that is hard since
it's a bit cryptic particularly since it is at the Lucene level which
might not correspond exactly to the ES query.

More fundamentally don't you want "must" for works.current? Should by
itself is "one of these should match", and should with other musts is "0
or more of these should match".
Right now you are getting title match OR current matches aren't you?

-Paul

--


(Lukáš Vlček) #4

Hi,

from reading your post I am not sure if you are actually using Nested Type
or not. IMO this can be quite important in your case
http://www.elasticsearch.org/guide/reference/mapping/nested-type.html

Regards,
Lukas

On Fri, Sep 14, 2012 at 7:17 PM, Chris Yuska chrisyuska@gmail.com wrote:

Thank you for the response. I probably should have disclosed more
information about the nature of this query. This is actually a general
query across all fields via a search box. Each Person has multiple other
fields (name, location, etc.), as well as other nested objects. Currently,
I just have all the fields we're trying to match in the bool query, along
with the option minimum_number_should_match: 1, which would incorrectly
find works.current as a match, so I currently have the works.current term
disabled.

After your comment and some more research, if I understand correctly, I
should move all the fields I'm looking for a match on in this basic query
to a bool filter (
http://www.elasticsearch.org/guide/reference/query-dsl/bool-filter.html)?
Then in the query, look for a match where works.current : true? The only
issue I see there is keeping some sort of sane scoring that isn't totally
dependent on works.current being true.

Thanks again.

On Friday, September 14, 2012 12:27:35 PM UTC-4, P Hill wrote:

I don't understand your whole problem, but a few things come to mind,
the idea of turning off coordination within clauses of your bool query
is one.
Another is to consider moving the checking for matches to a filter, so
that it doesn't effect the score.
Try to also learn something about search explain, but that is hard since
it's a bit cryptic particularly since it is at the Lucene level which
might not correspond exactly to the ES query.

More fundamentally don't you want "must" for works.current? Should by
itself is "one of these should match", and should with other musts is "0
or more of these should match".
Right now you are getting title match OR current matches aren't you?

-Paul

--

--


(Chris Yuska) #5

Hi Lukas,

I am using the nested type. Any advice on what I'm doing wrong / need to
do?

Thanks.

On Saturday, September 15, 2012 3:02:31 PM UTC-4, Lukáš Vlček wrote:

Hi,

from reading your post I am not sure if you are actually using Nested Type
or not. IMO this can be quite important in your case
http://www.elasticsearch.org/guide/reference/mapping/nested-type.html

Regards,
Lukas

On Fri, Sep 14, 2012 at 7:17 PM, Chris Yuska <chris...@gmail.com<javascript:>

wrote:

Thank you for the response. I probably should have disclosed more
information about the nature of this query. This is actually a general
query across all fields via a search box. Each Person has multiple other
fields (name, location, etc.), as well as other nested objects. Currently,
I just have all the fields we're trying to match in the bool query, along
with the option minimum_number_should_match: 1, which would incorrectly
find works.current as a match, so I currently have the works.current term
disabled.

After your comment and some more research, if I understand correctly, I
should move all the fields I'm looking for a match on in this basic query
to a bool filter (
http://www.elasticsearch.org/guide/reference/query-dsl/bool-filter.html)?
Then in the query, look for a match where works.current : true? The only
issue I see there is keeping some sort of sane scoring that isn't totally
dependent on works.current being true.

Thanks again.

On Friday, September 14, 2012 12:27:35 PM UTC-4, P Hill wrote:

I don't understand your whole problem, but a few things come to mind,
the idea of turning off coordination within clauses of your bool query
is one.
Another is to consider moving the checking for matches to a filter, so
that it doesn't effect the score.
Try to also learn something about search explain, but that is hard since
it's a bit cryptic particularly since it is at the Lucene level which
might not correspond exactly to the ES query.

More fundamentally don't you want "must" for works.current? Should by
itself is "one of these should match", and should with other musts is "0
or more of these should match".
Right now you are getting title match OR current matches aren't you?

-Paul

--

--


(Lukáš Vlček) #6

Hi,

I tried to do full recreation. You can find it here:

To me it seems to work correctly: Lukas and Karel scoring equally
(regardless of Lukas having more current Nested objects then Karel) and Jan
scoring less (he has no Nested object with current set to true).

Regards,
Lukas

note to @karmiq, this example Karel is a different Karel :slight_smile:

On Mon, Sep 17, 2012 at 8:16 PM, Chris Yuska chrisyuska@gmail.com wrote:

Hi Lukas,

I am using the nested type. Any advice on what I'm doing wrong / need to
do?

Thanks.

On Saturday, September 15, 2012 3:02:31 PM UTC-4, Lukáš Vlček wrote:

Hi,

from reading your post I am not sure if you are actually using Nested
Type or not. IMO this can be quite important in your case
http://www.elasticsearch.org/**guide/reference/mapping/**nested-type.htmlhttp://www.elasticsearch.org/guide/reference/mapping/nested-type.html

Regards,
Lukas

On Fri, Sep 14, 2012 at 7:17 PM, Chris Yuska chris...@gmail.com wrote:

Thank you for the response. I probably should have disclosed more
information about the nature of this query. This is actually a general
query across all fields via a search box. Each Person has multiple other
fields (name, location, etc.), as well as other nested objects. Currently,
I just have all the fields we're trying to match in the bool query, along
with the option minimum_number_should_match: 1, which would incorrectly
find works.current as a match, so I currently have the works.current term
disabled.

After your comment and some more research, if I understand correctly, I
should move all the fields I'm looking for a match on in this basic query
to a bool filter (http://www.elasticsearch.org/**
guide/reference/query-dsl/**bool-filter.htmlhttp://www.elasticsearch.org/guide/reference/query-dsl/bool-filter.html)?
Then in the query, look for a match where works.current : true? The only
issue I see there is keeping some sort of sane scoring that isn't totally
dependent on works.current being true.

Thanks again.

On Friday, September 14, 2012 12:27:35 PM UTC-4, P Hill wrote:

I don't understand your whole problem, but a few things come to mind,
the idea of turning off coordination within clauses of your bool query
is one.
Another is to consider moving the checking for matches to a filter, so
that it doesn't effect the score.
Try to also learn something about search explain, but that is hard
since
it's a bit cryptic particularly since it is at the Lucene level which
might not correspond exactly to the ES query.

More fundamentally don't you want "must" for works.current? Should by
itself is "one of these should match", and should with other musts is
"0
or more of these should match".
Right now you are getting title match OR current matches aren't you?

-Paul

--

--

--


(Chris Yuska) #7

Hey again,

Thanks again for your help. I have a couple issues with the recreation.
First, the "dummy" match restricts the match to only the specified nested
works. There is no "must" in my query, as it's really just searching that
the search query "must match one of these fields." Which brings me to the
second issue. the "name" field isn't being matched on in your recreation.
I modified your gist a little bit to illustrate my intention (
https://gist.github.com/3739681).

It's incorrect as the "should match name" clause doesn't actually hit
anything anyway, but I hope it shows what I'm trying to do at least. A
search for "developer" should find all people with a job title of
"developer" or name of "developer." If the person has a job title of
"developer," then if that job is also current, it should get a boost in
overall score. Does this make more sense? What I really don't know how to
do is combine the nested query for "title" and "current" with the top-level
query for "name."

On Monday, September 17, 2012 4:16:35 PM UTC-4, Lukáš Vlček wrote:

Hi,

I tried to do full recreation. You can find it here:
https://gist.github.com/3739469

To me it seems to work correctly: Lukas and Karel scoring equally
(regardless of Lukas having more current Nested objects then Karel) and Jan
scoring less (he has no Nested object with current set to true).

Regards,
Lukas

note to @karmiq, this example Karel is a different Karel :slight_smile:

On Mon, Sep 17, 2012 at 8:16 PM, Chris Yuska <chris...@gmail.com<javascript:>

wrote:

Hi Lukas,

I am using the nested type. Any advice on what I'm doing wrong / need to
do?

Thanks.

On Saturday, September 15, 2012 3:02:31 PM UTC-4, Lukáš Vlček wrote:

Hi,

from reading your post I am not sure if you are actually using Nested
Type or not. IMO this can be quite important in your case
http://www.elasticsearch.org/guide/reference/mapping/
nested-type.htmlhttp://www.elasticsearch.org/guide/reference/mapping/nested-type.html

Regards,
Lukas

On Fri, Sep 14, 2012 at 7:17 PM, Chris Yuska chris...@gmail.com wrote:

Thank you for the response. I probably should have disclosed more
information about the nature of this query. This is actually a general
query across all fields via a search box. Each Person has multiple other
fields (name, location, etc.), as well as other nested objects. Currently,
I just have all the fields we're trying to match in the bool query, along
with the option minimum_number_should_match: 1, which would incorrectly
find works.current as a match, so I currently have the works.current term
disabled.

After your comment and some more research, if I understand correctly, I
should move all the fields I'm looking for a match on in this basic query
to a bool filter (http://www.elasticsearch.org/**
guide/reference/query-dsl/**bool-filter.htmlhttp://www.elasticsearch.org/guide/reference/query-dsl/bool-filter.html)?
Then in the query, look for a match where works.current : true? The only
issue I see there is keeping some sort of sane scoring that isn't totally
dependent on works.current being true.

Thanks again.

On Friday, September 14, 2012 12:27:35 PM UTC-4, P Hill wrote:

I don't understand your whole problem, but a few things come to mind,

the idea of turning off coordination within clauses of your bool query
is one.
Another is to consider moving the checking for matches to a filter, so
that it doesn't effect the score.
Try to also learn something about search explain, but that is hard
since
it's a bit cryptic particularly since it is at the Lucene level which
might not correspond exactly to the ES query.

More fundamentally don't you want "must" for works.current? Should by
itself is "one of these should match", and should with other musts is
"0
or more of these should match".
Right now you are getting title match OR current matches aren't you?

-Paul

--

--

--


(Chris Yuska) #8

I just happened across this issue:
https://github.com/elasticsearch/elasticsearch/issues/1383. Assuming it
hasn't been added yet, it's what would essentially give me the visibility I
need for my purposes.

On Monday, September 17, 2012 5:35:58 PM UTC-4, Chris Yuska wrote:

Hey again,

Thanks again for your help. I have a couple issues with the recreation.
First, the "dummy" match restricts the match to only the specified nested
works. There is no "must" in my query, as it's really just searching that
the search query "must match one of these fields." Which brings me to the
second issue. the "name" field isn't being matched on in your recreation.
I modified your gist a little bit to illustrate my intention (
https://gist.github.com/3739681).

It's incorrect as the "should match name" clause doesn't actually hit
anything anyway, but I hope it shows what I'm trying to do at least. A
search for "developer" should find all people with a job title of
"developer" or name of "developer." If the person has a job title of
"developer," then if that job is also current, it should get a boost in
overall score. Does this make more sense? What I really don't know how to
do is combine the nested query for "title" and "current" with the top-level
query for "name."

On Monday, September 17, 2012 4:16:35 PM UTC-4, Lukáš Vlček wrote:

Hi,

I tried to do full recreation. You can find it here:
https://gist.github.com/3739469

To me it seems to work correctly: Lukas and Karel scoring equally
(regardless of Lukas having more current Nested objects then Karel) and Jan
scoring less (he has no Nested object with current set to true).

Regards,
Lukas

note to @karmiq, this example Karel is a different Karel :slight_smile:

On Mon, Sep 17, 2012 at 8:16 PM, Chris Yuska chris...@gmail.com wrote:

Hi Lukas,

I am using the nested type. Any advice on what I'm doing wrong / need
to do?

Thanks.

On Saturday, September 15, 2012 3:02:31 PM UTC-4, Lukáš Vlček wrote:

Hi,

from reading your post I am not sure if you are actually using Nested
Type or not. IMO this can be quite important in your case
http://www.elasticsearch.org/guide/reference/mapping/
nested-type.htmlhttp://www.elasticsearch.org/guide/reference/mapping/nested-type.html

Regards,
Lukas

On Fri, Sep 14, 2012 at 7:17 PM, Chris Yuska chris...@gmail.comwrote:

Thank you for the response. I probably should have disclosed more
information about the nature of this query. This is actually a general
query across all fields via a search box. Each Person has multiple other
fields (name, location, etc.), as well as other nested objects. Currently,
I just have all the fields we're trying to match in the bool query, along
with the option minimum_number_should_match: 1, which would incorrectly
find works.current as a match, so I currently have the works.current term
disabled.

After your comment and some more research, if I understand correctly,
I should move all the fields I'm looking for a match on in this basic query
to a bool filter (http://www.elasticsearch.org/**
guide/reference/query-dsl/**bool-filter.htmlhttp://www.elasticsearch.org/guide/reference/query-dsl/bool-filter.html)?
Then in the query, look for a match where works.current : true? The only
issue I see there is keeping some sort of sane scoring that isn't totally
dependent on works.current being true.

Thanks again.

On Friday, September 14, 2012 12:27:35 PM UTC-4, P Hill wrote:

I don't understand your whole problem, but a few things come to mind,

the idea of turning off coordination within clauses of your bool
query
is one.
Another is to consider moving the checking for matches to a filter,
so
that it doesn't effect the score.
Try to also learn something about search explain, but that is hard
since
it's a bit cryptic particularly since it is at the Lucene level which
might not correspond exactly to the ES query.

More fundamentally don't you want "must" for works.current? Should
by
itself is "one of these should match", and should with other musts is
"0
or more of these should match".
Right now you are getting title match OR current matches aren't you?

-Paul

--

--

--


(system) #9