Filter base on score


(Marcelo Elias Del Valle) #1

I would like to filter results based on score, but min_score only works on
top of my search, it doesn 't work inside a filtered query.
Is there any other way to not return results with score = 0, besides
min_score?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #2

The filter that is part of a filtered query is executed before the query,
therefore the filter does not have any scores since scoring happens during
the query.

min_score is the easiest solution. Have you looked at the custom filter
score, which has been replaced by the function score query (but I have
never used so I cannot comment on it)? Perhaps if you adjust your scoring,
there would be no need to do additional filtering.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:20 PM, Marcelo Elias Del Valle <mvallebr@gmail.com

wrote:

I would like to filter results based on score, but min_score only works on
top of my search, it doesn 't work inside a filtered query.
Is there any other way to not return results with score = 0, besides
min_score?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Marcelo Elias Del Valle) #3

Ivan,

But I need the filtering. What I am trying to do is using the score to 

filter based on result frequency.
I am already using function_score, as follows bellow. However, I need
to bring only parent docs which have at least N childs.
Ins't the query inside the "or" filter executed before the filter
itself? I was guessing min_score should apply to the "function_score" query
bellow...

{
"query": {
"filtered" : {
"query": {
"match_all": {
}
},
"filter" : {
"or" : [
{
"min_score": 0,
"query": {
"function_score": {
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
},
"boost": "1",
"script_score" : {
"script" : "_score"
},
"boost_mode":"replace"
}

                    }
                }
            ]
        }
    }
}

}

Best regards,
Marcelo.

Em quinta-feira, 17 de outubro de 2013 17h42min15s UTC-3, Ivan Brusic
escreveu:

The filter that is part of a filtered query is executed before the query,
therefore the filter does not have any scores since scoring happens during
the query.

min_score is the easiest solution. Have you looked at the custom filter
score, which has been replaced by the function score query (but I have
never used so I cannot comment on it)? Perhaps if you adjust your scoring,
there would be no need to do additional filtering.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:20 PM, Marcelo Elias Del Valle <
mval...@gmail.com <javascript:>> wrote:

I would like to filter results based on score, but min_score only works
on top of my search, it doesn 't work inside a filtered query.
Is there any other way to not return results with score = 0, besides
min_score?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #4

I have not looked at your query, but you can always apply a filter to your
query after it executes.

With a filtered query, the filter happens first, which is normally optimal
because you do not want to score documents that will eventually be filtered
out. There are a few use cases where a filtered query (aka pre filter) does
not work and your use case might be one of them. Of course, you can have
both a filtered query and then a standard filter.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:50 PM, Marcelo Elias Del Valle <mvallebr@gmail.com

wrote:

Ivan,

But I need the filtering. What I am trying to do is using the score to

filter based on result frequency.
I am already using function_score, as follows bellow. However, I need
to bring only parent docs which have at least N childs.
Ins't the query inside the "or" filter executed before the filter
itself? I was guessing min_score should apply to the "function_score" query
bellow...

{
"query": {
"filtered" : {
"query": {
"match_all": {
}
},
"filter" : {
"or" : [
{
"min_score": 0,
"query": {
"function_score": {
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
},
"boost": "1",
"script_score" : {
"script" : "_score"
},
"boost_mode":"replace"
}

                    }
                }
            ]
        }
    }
}

}

Best regards,
Marcelo.

Em quinta-feira, 17 de outubro de 2013 17h42min15s UTC-3, Ivan Brusic
escreveu:

The filter that is part of a filtered query is executed before the query,
therefore the filter does not have any scores since scoring happens during
the query.

min_score is the easiest solution. Have you looked at the custom filter
score, which has been replaced by the function score query (but I have
never used so I cannot comment on it)? Perhaps if you adjust your scoring,
there would be no need to do additional filtering.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:20 PM, Marcelo Elias Del Valle <
mval...@gmail.com> wrote:

I would like to filter results based on score, but min_score only works
on top of my search, it doesn 't work inside a filtered query.
Is there any other way to not return results with score = 0, besides
min_score?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Marcelo Elias Del Valle) #5

Ivan,

But if I cannot user filter query executed after score is calculated,

is there another way of executing boolean conditions (like AND, OR, NOT) on
query results?

Best regards,
Marcelo.

2013/10/17 Ivan Brusic ivan@brusic.com

I have not looked at your query, but you can always apply a filter to your
query after it executes.

With a filtered query, the filter happens first, which is normally optimal
because you do not want to score documents that will eventually be filtered
out. There are a few use cases where a filtered query (aka pre filter) does
not work and your use case might be one of them. Of course, you can have
both a filtered query and then a standard filter.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:50 PM, Marcelo Elias Del Valle <
mvallebr@gmail.com> wrote:

Ivan,

But I need the filtering. What I am trying to do is using the score

to filter based on result frequency.
I am already using function_score, as follows bellow. However, I need
to bring only parent docs which have at least N childs.
Ins't the query inside the "or" filter executed before the filter
itself? I was guessing min_score should apply to the "function_score" query
bellow...

{
"query": {
"filtered" : {
"query": {
"match_all": {
}
},
"filter" : {
"or" : [
{
"min_score": 0,
"query": {
"function_score": {
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
},
"boost": "1",
"script_score" : {
"script" : "_score"
},
"boost_mode":"replace"
}

                    }
                }
            ]
        }
    }
}

}

Best regards,
Marcelo.

Em quinta-feira, 17 de outubro de 2013 17h42min15s UTC-3, Ivan Brusic
escreveu:

The filter that is part of a filtered query is executed before the
query, therefore the filter does not have any scores since scoring happens
during the query.

min_score is the easiest solution. Have you looked at the custom filter
score, which has been replaced by the function score query (but I have
never used so I cannot comment on it)? Perhaps if you adjust your scoring,
there would be no need to do additional filtering.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:20 PM, Marcelo Elias Del Valle <
mval...@gmail.com> wrote:

I would like to filter results based on score, but min_score only works
on top of my search, it doesn 't work inside a filtered query.
Is there any other way to not return results with score = 0, besides
min_score?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Luca Cavanna) #6

HI Marcelo,
you can use a bool
query

to achieve the same, can't you? Must clauses are mandatory, should clauses
are optional, and you can control how many of them must match to achieve
something that's actually between a logic AND and a logic OR.

We also have the bool
filter
,
which is good in most of the cases to do the same with filters, when you
don't care about scoring and you just want to filter out results. The
reason why we have and, or and not filters too is the way they get executed
which is, briefly, less cache-friendly compared to the bool filter, but
helps when you have expensive filters that are not easily cacheable and you
want to control the execution order of those. Have a look at this
article

that clearly explains the difference.

Cheers
Luca

On Thursday, October 17, 2013 11:37:28 PM UTC+2, Marcelo Elias Del Valle
wrote:

Ivan,

But if I cannot user filter query executed after score is calculated, 

is there another way of executing boolean conditions (like AND, OR, NOT) on
query results?

Best regards,
Marcelo.

2013/10/17 Ivan Brusic <iv...@brusic.com <javascript:>>

I have not looked at your query, but you can always apply a filter to
your query after it executes.

With a filtered query, the filter happens first, which is normally
optimal because you do not want to score documents that will eventually be
filtered out. There are a few use cases where a filtered query (aka pre
filter) does not work and your use case might be one of them. Of course,
you can have both a filtered query and then a standard filter.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:50 PM, Marcelo Elias Del Valle <
mval...@gmail.com <javascript:>> wrote:

Ivan,

But I need the filtering. What I am trying to do is using the score 

to filter based on result frequency.
I am already using function_score, as follows bellow. However, I
need to bring only parent docs which have at least N childs.
Ins't the query inside the "or" filter executed before the filter
itself? I was guessing min_score should apply to the "function_score" query
bellow...

{
"query": {
"filtered" : {
"query": {
"match_all": {
}
},
"filter" : {
"or" : [
{
"min_score": 0,
"query": {
"function_score": {
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
},
"boost": "1",
"script_score" : {
"script" : "_score"
},
"boost_mode":"replace"
}

                    }
                }
            ]
        }
    }
}

}

Best regards,
Marcelo.

Em quinta-feira, 17 de outubro de 2013 17h42min15s UTC-3, Ivan Brusic
escreveu:

The filter that is part of a filtered query is executed before the
query, therefore the filter does not have any scores since scoring happens
during the query.

min_score is the easiest solution. Have you looked at the custom filter
score, which has been replaced by the function score query (but I have
never used so I cannot comment on it)? Perhaps if you adjust your scoring,
there would be no need to do additional filtering.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:20 PM, Marcelo Elias Del Valle <
mval...@gmail.com> wrote:

I would like to filter results based on score, but min_score only
works on top of my search, it doesn 't work inside a filtered query.
Is there any other way to not return results with score = 0, besides
min_score?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Clinton Gormley) #7

Hi Marcelo

I've written up a working example of how to use the function_score query
to achieve what you want:

You can only use min_score at the top level, but I've used the
function_score to multiply the real score by -1 if a parent document
doesn't have enough children (cutoff).

I also demonstrate how you could combine this technique with a full text
query on the parent docs.

clint

On 18 October 2013 10:30, Luca Cavanna cavannaluca@gmail.com wrote:

HI Marcelo,
you can use a bool query
to achieve the same, can't you? Must clauses are mandatory, should clauses
are optional, and you can control how many of them must match to achieve
something that's actually between a logic AND and a logic OR.

We also have the bool filter,
which is good in most of the cases to do the same with filters, when you
don't care about scoring and you just want to filter out results. The
reason why we have and, or and not filters too is the way they get executed
which is, briefly, less cache-friendly compared to the bool filter, but
helps when you have expensive filters that are not easily cacheable and you
want to control the execution order of those. Have a look at this article
that clearly explains the difference.

Cheers
Luca

On Thursday, October 17, 2013 11:37:28 PM UTC+2, Marcelo Elias Del Valle
wrote:

Ivan,

But if I cannot user filter query executed after score is calculated,

is there another way of executing boolean conditions (like AND, OR, NOT) on
query results?

Best regards,
Marcelo.

2013/10/17 Ivan Brusic iv...@brusic.com

I have not looked at your query, but you can always apply a filter to
your query after it executes.

With a filtered query, the filter happens first, which is normally
optimal because you do not want to score documents that will eventually be
filtered out. There are a few use cases where a filtered query (aka pre
filter) does not work and your use case might be one of them. Of course,
you can have both a filtered query and then a standard filter.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:50 PM, Marcelo Elias Del Valle <
mval...@gmail.com> wrote:

Ivan,

But I need the filtering. What I am trying to do is using the score

to filter based on result frequency.
I am already using function_score, as follows bellow. However, I
need to bring only parent docs which have at least N childs.
Ins't the query inside the "or" filter executed before the filter
itself? I was guessing min_score should apply to the "function_score" query
bellow...

{
"query": {
"filtered" : {
"query": {
"match_all": {
}
},
"filter" : {
"or" : [
{
"min_score": 0,
"query": {
"function_score": {
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
},
"boost": "1",
"script_score" : {
"script" : "_score"
},
"boost_mode":"replace"
}

                    }
                }
            ]
        }
    }
}

}

Best regards,
Marcelo.

Em quinta-feira, 17 de outubro de 2013 17h42min15s UTC-3, Ivan Brusic
escreveu:

The filter that is part of a filtered query is executed before the
query, therefore the filter does not have any scores since scoring happens
during the query.

min_score is the easiest solution. Have you looked at the custom
filter score, which has been replaced by the function score query (but I
have never used so I cannot comment on it)? Perhaps if you adjust your
scoring, there would be no need to do additional filtering.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:20 PM, Marcelo Elias Del Valle <
mval...@gmail.com> wrote:

I would like to filter results based on score, but min_score only
works on top of my search, it doesn 't work inside a filtered query.
Is there any other way to not return results with score = 0, besides
min_score?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.**com.

For more options, visit https://groups.google.com/groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**qLXupHz0PKo/unsubscribehttps://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Marcelo Elias Del Valle) #8

Luca,

First of all, thanks a lot for the answer. I had seen the bool query

yesterday, but I probably missed something, because I really had not
understood it was the same as AND, OR and NOT. So a should query with
minimum_should_match = 1 will give me the same result as the OR, right? I
am sorry, I should have realized that before asking! Thanks a lot, it was
very helpful. I still have the problem to filter the score on each
has_child query, but I guess Clinton's message will be elucidative as soon
as I understand it properly. :smiley:

Best regards,
Marcelo Valle.

2013/10/18 Luca Cavanna cavannaluca@gmail.com

HI Marcelo,
you can use a bool query
to achieve the same, can't you? Must clauses are mandatory, should clauses
are optional, and you can control how many of them must match to achieve
something that's actually between a logic AND and a logic OR.

We also have the bool filter,
which is good in most of the cases to do the same with filters, when you
don't care about scoring and you just want to filter out results. The
reason why we have and, or and not filters too is the way they get executed
which is, briefly, less cache-friendly compared to the bool filter, but
helps when you have expensive filters that are not easily cacheable and you
want to control the execution order of those. Have a look at this article
that clearly explains the difference.

Cheers
Luca

On Thursday, October 17, 2013 11:37:28 PM UTC+2, Marcelo Elias Del Valle
wrote:

Ivan,

But if I cannot user filter query executed after score is calculated,

is there another way of executing boolean conditions (like AND, OR, NOT) on
query results?

Best regards,
Marcelo.

2013/10/17 Ivan Brusic iv...@brusic.com

I have not looked at your query, but you can always apply a filter to
your query after it executes.

With a filtered query, the filter happens first, which is normally
optimal because you do not want to score documents that will eventually be
filtered out. There are a few use cases where a filtered query (aka pre
filter) does not work and your use case might be one of them. Of course,
you can have both a filtered query and then a standard filter.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:50 PM, Marcelo Elias Del Valle <
mval...@gmail.com> wrote:

Ivan,

But I need the filtering. What I am trying to do is using the score

to filter based on result frequency.
I am already using function_score, as follows bellow. However, I
need to bring only parent docs which have at least N childs.
Ins't the query inside the "or" filter executed before the filter
itself? I was guessing min_score should apply to the "function_score" query
bellow...

{
"query": {
"filtered" : {
"query": {
"match_all": {
}
},
"filter" : {
"or" : [
{
"min_score": 0,
"query": {
"function_score": {
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
},
"boost": "1",
"script_score" : {
"script" : "_score"
},
"boost_mode":"replace"
}

                    }
                }
            ]
        }
    }
}

}

Best regards,
Marcelo.

Em quinta-feira, 17 de outubro de 2013 17h42min15s UTC-3, Ivan Brusic
escreveu:

The filter that is part of a filtered query is executed before the
query, therefore the filter does not have any scores since scoring happens
during the query.

min_score is the easiest solution. Have you looked at the custom
filter score, which has been replaced by the function score query (but I
have never used so I cannot comment on it)? Perhaps if you adjust your
scoring, there would be no need to do additional filtering.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:20 PM, Marcelo Elias Del Valle <
mval...@gmail.com> wrote:

I would like to filter results based on score, but min_score only
works on top of my search, it doesn 't work inside a filtered query.
Is there any other way to not return results with score = 0, besides
min_score?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.**com.

For more options, visit https://groups.google.com/groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**qLXupHz0PKo/unsubscribehttps://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Marcelo Elias Del Valle) #9

Clinton,

Thanks for the answer and sorry to make you write an example. However,

I am still having problems trying to understand what you said...
I had written a query like this before:
documents/document/_search
{
"min_score": 0,
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
}
} pro

Even without using function score, it does what I want, but function

score is really more flexible because I can have a condition such as (N1 <
score < N2). But even if this solves my problem for 1 has_child query, what
to do when I have multiple has_child queries? each function score would
have boost_mode = mult, right? But wouldn't that mean that if the two funct
wilion scores returned -1 I would have a positive number as final result?
I didn't properly understand how the bool query will get the score from
the function_score queries and how it will be combined. For instance, if I
use should, one of the two scores should be greater than 0, but if I use
must, both scores should be greater than 0... I was guessing my need to to
have min_score inside the must clause, not in the top level.
Would you show me a working example with more than 1 query? The example
bellow was my try to make your suggestion work.

{
"query":{
"bool": {
"must": [{
"function_score": {
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
},
"boost": "1",
"script_score" : {
"script" : "_score<3?-1:1"
},
"boost_mode":"mult"
}},{
"function_score": {
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"constant_score": {
"filter": {
"term": { "text": "finally" }
}
}
}
}
},
"boost": "1",
"script_score" : {
"script" : "_score<1?-1:1"
},
"boost_mode":"mult"
}}
]
}
}
}

================================
curl -XPUT http://localhost:9200/documents
curl -XPUT http://localhost:9200/documents/comment/_mapping -d '{
"comment" : {
"_parent" : {
"type" : "document"
}
}
}'
curl -XPUT http://localhost:9200/documents/document/1 -d '{
"title": "Parent document with comments 1",
"content": "several test comments :D"
}'

curl -XPUT http://localhost:9200/documents/document/2 -d '{
"title": "Parent document with comments 2",
"content": "several test comments :D"
}'

curl -XPUT http://localhost:9200/documents/document/3 -d '{
"title": "Parent document with comments 3",
"content": "several test comments :D"
}'

curl -XPUT http://localhost:9200/documents/document/4 -d '{
"title": "Parent document with comments 4",
"content": "several test comments :D"
}'

curl -XPUT 'http://localhost:9200/documents/comment/1?parent=1' -d '{
"root_id": 1,
"text": "Oh my god ...",
"author": "John Doe",
"date": [20130201, 20130202],
"field1": 1,
"field2": 10
}'

curl -XPUT 'http://localhost:9200/documents/comment/2?parent=1' -d '{
"root_id": 1,
"text": "Finally!",
"author": "Jane Roe",
"date": [20130202],
"field1": 1,
"field2": 10
}'

curl -XPUT 'http://localhost:9200/documents/comment/3?parent=1' -d '{
"root_id": 1,
"text": "text 3",
"author": "Walter",
"date": [20130201],
"field1": 1,
"field2": 20
}'

curl -XPUT 'http://localhost:9200/documents/comment/4?parent=1' -d '{
"root_id": 1,
"text": "text 4",
"author": "Mario",
"date": [20130112],
"field1": 1,
"field2": 20
}'

curl -XPUT 'http://localhost:9200/documents/comment/5?parent=1' -d '{
"root_id": 1,
"text": "text 5",
"author": "Maria",
"date": [20130203],
"field1": 2,
"field2": 10
}'

curl -XPUT 'http://localhost:9200/documents/comment/6?parent=1' -d '{
"root_id": 1,
"text": "text 6",
"author": "Marcel",
"date": [20130205],
"field1": 2,
"field2": 20
}'

curl -XPUT 'http://localhost:9200/documents/comment/7?parent=2' -d '{
"root_id": 2,
"text": "text 7",
"author": "Marcelo",
"date": [20130204],
"field1": 2,
"field2": 20
}'

Best regards,
Marcelo.

2013/10/18 Clinton Gormley clint@traveljury.com

Hi Marcelo

I've written up a working example of how to use the function_score query
to achieve what you want:

https://gist.github.com/clintongormley/7039568

You can only use min_score at the top level, but I've used the
function_score to multiply the real score by -1 if a parent document
doesn't have enough children (cutoff).

I also demonstrate how you could combine this technique with a full text
query on the parent docs.

clint

On 18 October 2013 10:30, Luca Cavanna cavannaluca@gmail.com wrote:

HI Marcelo,
you can use a bool query
to achieve the same, can't you? Must clauses are mandatory, should clauses
are optional, and you can control how many of them must match to achieve
something that's actually between a logic AND and a logic OR.

We also have the bool filter,
which is good in most of the cases to do the same with filters, when you
don't care about scoring and you just want to filter out results. The
reason why we have and, or and not filters too is the way they get executed
which is, briefly, less cache-friendly compared to the bool filter, but
helps when you have expensive filters that are not easily cacheable and you
want to control the execution order of those. Have a look at this article
that clearly explains the difference.

Cheers
Luca

On Thursday, October 17, 2013 11:37:28 PM UTC+2, Marcelo Elias Del Valle
wrote:

Ivan,

But if I cannot user filter query executed after score is

calculated, is there another way of executing boolean conditions (like AND,
OR, NOT) on query results?

Best regards,
Marcelo.

2013/10/17 Ivan Brusic iv...@brusic.com

I have not looked at your query, but you can always apply a filter to
your query after it executes.

With a filtered query, the filter happens first, which is normally
optimal because you do not want to score documents that will eventually be
filtered out. There are a few use cases where a filtered query (aka pre
filter) does not work and your use case might be one of them. Of course,
you can have both a filtered query and then a standard filter.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:50 PM, Marcelo Elias Del Valle <
mval...@gmail.com> wrote:

Ivan,

But I need the filtering. What I am trying to do is using the

score to filter based on result frequency.
I am already using function_score, as follows bellow. However, I
need to bring only parent docs which have at least N childs.
Ins't the query inside the "or" filter executed before the filter
itself? I was guessing min_score should apply to the "function_score" query
bellow...

{
"query": {
"filtered" : {
"query": {
"match_all": {
}
},
"filter" : {
"or" : [
{
"min_score": 0,
"query": {
"function_score": {
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
},
"boost": "1",
"script_score" : {
"script" : "_score"
},
"boost_mode":"replace"
}

                    }
                }
            ]
        }
    }
}

}

Best regards,
Marcelo.

Em quinta-feira, 17 de outubro de 2013 17h42min15s UTC-3, Ivan Brusic
escreveu:

The filter that is part of a filtered query is executed before the
query, therefore the filter does not have any scores since scoring happens
during the query.

min_score is the easiest solution. Have you looked at the custom
filter score, which has been replaced by the function score query (but I
have never used so I cannot comment on it)? Perhaps if you adjust your
scoring, there would be no need to do additional filtering.

Cheers,

Ivan

On Thu, Oct 17, 2013 at 1:20 PM, Marcelo Elias Del Valle <
mval...@gmail.com> wrote:

I would like to filter results based on score, but min_score only
works on top of my search, it doesn 't work inside a filtered query.
Is there any other way to not return results with score = 0, besides
min_score?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.**com.

For more options, visit https://groups.google.com/groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**qLXupHz0PKo/unsubscribehttps://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Clinton Gormley) #10

Hi Marcelo

On 18 October 2013 17:28, Marcelo Elias Del Valle mvallebr@gmail.comwrote:

Even without using function score, it does what I want, but function

score is really more flexible because I can have a condition such as (N1 <
score < N2). But even if this solves my problem for 1 has_child query, what
to do when I have multiple has_child queries? each function score would
have boost_mode = mult, right? But wouldn't that mean that if the two funct
wilion scores returned -1 I would have a positive number as final result?

In this case, change the script to: "(_score > 0 ) && (_score < cutoff) ?
-1 : 1"

Btw, you should use parameters (like cutoff) rather than putting those
values directly into the script, otherwise every time you change the cutoff
value Elasticsearch has to recompile the script.

I didn't properly understand how the bool query will get the score

from the function_score queries and how it will be combined. For instance,
if I use should, one of the two scores should be greater than 0, but if I
use must, both scores should be greater than 0... I was guessing my need to
to have min_score inside the must clause, not in the top level.

The _score is separate from whether a document is included in the results
or not. Only the min_score filter (applied at the end of the query) allows
you to filter out results based on score.

I've written up two example queries based on your data, which do slightly
different things. The first says: give me parents which have more than
$cutoff children, where each child is in this date range and includes the
word "finally"

The second says: give me parents which have more than $cutoff children
within this date range, and more than $cutoff children which contain the
word "finally" (ie the two clauses do not have to be true in the same
children.

btw, running multiple has_children queries is likely to be slow. i'd
reconsider your requirements :slight_smile:

hth

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Marcelo Elias Del Valle) #11

Clinton,

I ran your second query (two.json -

https://gist.github.com/clintongormley/7044466), which BTW is exactly what
I was trying to do, but using a cutoff of 15 in the first has_child query
(text matches finally) using the data and mapping I sent you in the last
e-mail, it returned a positive score, while I was excepting it to return
nothing... Am I missing something? I am using ES 0.90.5 on a Debian Wheezy
64 Machine...

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 3,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 3,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
}
]
}
}

Just to check, I ran

{
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"constant_score": {
"query": {
"match": {
"text": "Finally!"
}
}
}
}
}
}
}

and the returned score was 1:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 1,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
}
]
}
}

Here it is the full query I used, exactly like yours, but changing cutoff
to 15:
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"constant_score": {
"query": {
"match": {
"text": "Finally!"
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 15
},
"script": "(_score > 0) && (_score < cutoff) ? -1 :1"
},
"boost_mode": "mult"
}
},
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"filtered": {
"filter": {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 4
},
"script": "(_score > 0) && (_score < cutoff) ? -1 :1"
},
"boost_mode": "mult"
}
}
]
}
}
}

Best regards,
Marcelo Valle.

2013/10/18 Clinton Gormley clint@traveljury.com

Hi Marcelo

On 18 October 2013 17:28, Marcelo Elias Del Valle mvallebr@gmail.comwrote:

Even without using function score, it does what I want, but function

score is really more flexible because I can have a condition such as (N1 <
score < N2). But even if this solves my problem for 1 has_child query, what
to do when I have multiple has_child queries? each function score would
have boost_mode = mult, right? But wouldn't that mean that if the two funct
wilion scores returned -1 I would have a positive number as final result?

In this case, change the script to: "(_score > 0 ) && (_score < cutoff) ?
-1 : 1"

Btw, you should use parameters (like cutoff) rather than putting those
values directly into the script, otherwise every time you change the cutoff
value Elasticsearch has to recompile the script.

I didn't properly understand how the bool query will get the score

from the function_score queries and how it will be combined. For instance,
if I use should, one of the two scores should be greater than 0, but if I
use must, both scores should be greater than 0... I was guessing my need to
to have min_score inside the must clause, not in the top level.

The _score is separate from whether a document is included in the results
or not. Only the min_score filter (applied at the end of the query) allows
you to filter out results based on score.

I've written up two example queries based on your data, which do slightly
different things. The first says: give me parents which have more than
$cutoff children, where each child is in this date range and includes the
word "finally"

https://gist.github.com/clintongormley/7044461

The second says: give me parents which have more than $cutoff children
within this date range, and more than $cutoff children which contain the
word "finally" (ie the two clauses do not have to be true in the same
children.

https://gist.github.com/clintongormley/7044466

btw, running multiple has_children queries is likely to be slow. i'd
reconsider your requirements :slight_smile:

hth

clint

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Marcelo Elias Del Valle) #12

Hello,

Just to increase clarification about the last e-mail about this

subject, shouldn't the bellow query return no results? I am still confused
with the way the scoring works on bool queries.

{
"min_score": 0,
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": "-1"
},
"boost_mode": "mult"
}
},
{
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": "1"
},
"boost_mode": "mult"
}
},
{
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": "1"
},
"boost_mode": "mult"
}
}
]
}
}
}

{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.57735026,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "3",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 3",
"content": "several test comments :D"
}
},
{
"_index": "documents",
"_type": "document",
"_id": "4",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 4",
"content": "several test comments :D"
}
},
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
},
{
"_index": "documents",
"_type": "document",
"_id": "2",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 2",
"content": "several test comments :D"
}
}
]
}
}

Best regards,
Marcelo Valle.

2013/10/18 Marcelo Elias Del Valle mvallebr@gmail.com

Clinton,

I ran your second query (two.json -

https://gist.github.com/clintongormley/7044466), which BTW is exactly
what I was trying to do, but using a cutoff of 15 in the first has_child
query (text matches finally) using the data and mapping I sent you in the
last e-mail, it returned a positive score, while I was excepting it to
return nothing... Am I missing something? I am using ES 0.90.5 on a Debian
Wheezy 64 Machine...

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 3,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 3,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
}
]
}
}

Just to check, I ran

{
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"constant_score": {
"query": {
"match": {
"text": "Finally!"
}
}
}
}
}
}
}

and the returned score was 1:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 1,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
}
]
}
}

Here it is the full query I used, exactly like yours, but changing cutoff
to 15:
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"constant_score": {
"query": {
"match": {
"text": "Finally!"
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 15
},
"script": "(_score > 0) && (_score < cutoff) ? -1 :1"
},
"boost_mode": "mult"
}
},
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"filtered": {
"filter": {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 4
},
"script": "(_score > 0) && (_score < cutoff) ? -1 :1"
},
"boost_mode": "mult"
}
}
]
}
}
}

Best regards,
Marcelo Valle.

2013/10/18 Clinton Gormley clint@traveljury.com

Hi Marcelo

On 18 October 2013 17:28, Marcelo Elias Del Valle mvallebr@gmail.comwrote:

Even without using function score, it does what I want, but function

score is really more flexible because I can have a condition such as (N1 <
score < N2). But even if this solves my problem for 1 has_child query, what
to do when I have multiple has_child queries? each function score would
have boost_mode = mult, right? But wouldn't that mean that if the two funct
wilion scores returned -1 I would have a positive number as final result?

In this case, change the script to: "(_score > 0 ) && (_score < cutoff) ?
-1 : 1"

Btw, you should use parameters (like cutoff) rather than putting those
values directly into the script, otherwise every time you change the cutoff
value Elasticsearch has to recompile the script.

I didn't properly understand how the bool query will get the score

from the function_score queries and how it will be combined. For instance,
if I use should, one of the two scores should be greater than 0, but if I
use must, both scores should be greater than 0... I was guessing my need to
to have min_score inside the must clause, not in the top level.

The _score is separate from whether a document is included in the results
or not. Only the min_score filter (applied at the end of the query) allows
you to filter out results based on score.

I've written up two example queries based on your data, which do slightly
different things. The first says: give me parents which have more than
$cutoff children, where each child is in this date range and includes the
word "finally"

https://gist.github.com/clintongormley/7044461

The second says: give me parents which have more than $cutoff children
within this date range, and more than $cutoff children which contain the
word "finally" (ie the two clauses do not have to be true in the same
children.

https://gist.github.com/clintongormley/7044466

btw, running multiple has_children queries is likely to be slow. i'd
reconsider your requirements :slight_smile:

hth

clint

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Clinton Gormley) #13

Damn, I knew I should have run that query before posting it :slight_smile:

So the score of 0.577 that you got in your simple test query is the result
of "normalization". Try running it with ?explain=1 to see the details of
how that score is derived. Essentially it is (-1+1+1)/3 == 1/3 (where 3
is the number of clauses), which then gets "normalized", to return 0.577.

The second query as I posted it will not work as you want. I'll think about
it a bit more and see if I can come up with something that does work, but
it is starting to look a lot more complicated :slight_smile:

clint

On 18 October 2013 21:25, Marcelo Elias Del Valle mvallebr@gmail.comwrote:

Hello,

Just to increase clarification about the last e-mail about this

subject, shouldn't the bellow query return no results? I am still confused
with the way the scoring works on bool queries.

{
"min_score": 0,
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": "-1"
},
"boost_mode": "mult"
}
},
{
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": "1"
},
"boost_mode": "mult"
}
},
{
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": "1"
},
"boost_mode": "mult"
}
}
]
}
}
}

{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.57735026,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "3",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 3",
"content": "several test comments :D"
}
},
{
"_index": "documents",
"_type": "document",
"_id": "4",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 4",
"content": "several test comments :D"
}
},
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
},
{
"_index": "documents",
"_type": "document",
"_id": "2",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 2",
"content": "several test comments :D"
}
}
]
}
}

Best regards,
Marcelo Valle.

2013/10/18 Marcelo Elias Del Valle mvallebr@gmail.com

Clinton,

I ran your second query (two.json -

https://gist.github.com/clintongormley/7044466), which BTW is exactly
what I was trying to do, but using a cutoff of 15 in the first has_child
query (text matches finally) using the data and mapping I sent you in the
last e-mail, it returned a positive score, while I was excepting it to
return nothing... Am I missing something? I am using ES 0.90.5 on a Debian
Wheezy 64 Machine...

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 3,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 3,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
}
]
}
}

Just to check, I ran

{
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"constant_score": {
"query": {
"match": {
"text": "Finally!"
}
}
}
}
}
}
}

and the returned score was 1:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 1,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
}
]
}
}

Here it is the full query I used, exactly like yours, but changing cutoff
to 15:
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"constant_score": {
"query": {
"match": {
"text": "Finally!"
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 15
},
"script": "(_score > 0) && (_score < cutoff) ? -1 :1"
},
"boost_mode": "mult"
}
},
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"filtered": {
"filter": {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 4
},
"script": "(_score > 0) && (_score < cutoff) ? -1 :1"
},
"boost_mode": "mult"
}
}
]
}
}
}

Best regards,
Marcelo Valle.

2013/10/18 Clinton Gormley clint@traveljury.com

Hi Marcelo

On 18 October 2013 17:28, Marcelo Elias Del Valle mvallebr@gmail.comwrote:

Even without using function score, it does what I want, but

function score is really more flexible because I can have a condition such
as (N1 < score < N2). But even if this solves my problem for 1 has_child
query, what to do when I have multiple has_child queries? each function
score would have boost_mode = mult, right? But wouldn't that mean that if
the two funct wilion scores returned -1 I would have a positive number as
final result?

In this case, change the script to: "(_score > 0 ) && (_score < cutoff)
? -1 : 1"

Btw, you should use parameters (like cutoff) rather than putting those
values directly into the script, otherwise every time you change the cutoff
value Elasticsearch has to recompile the script.

I didn't properly understand how the bool query will get the score

from the function_score queries and how it will be combined. For instance,
if I use should, one of the two scores should be greater than 0, but if I
use must, both scores should be greater than 0... I was guessing my need to
to have min_score inside the must clause, not in the top level.

The _score is separate from whether a document is included in the
results or not. Only the min_score filter (applied at the end of the query)
allows you to filter out results based on score.

I've written up two example queries based on your data, which do
slightly different things. The first says: give me parents which have
more than $cutoff children, where each child is in this date range and
includes the word "finally"

https://gist.github.com/clintongormley/7044461

The second says: give me parents which have more than $cutoff children
within this date range, and more than $cutoff children which contain the
word "finally" (ie the two clauses do not have to be true in the same
children.

https://gist.github.com/clintongormley/7044466

btw, running multiple has_children queries is likely to be slow. i'd
reconsider your requirements :slight_smile:

hth

clint

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Marcelo Elias Del Valle) #14

Clinton,

I did some tests with boost mode=replace and realized the bool query

was summing the scores of the sub queries. I did some tests considering the
sum of the scores and it worked, as in the complex query bellow. Using a
must clause with 3 conditions (3 has_child queries) I would have to have a
score >= 3 to return the document...
I am not sure I am still not missing anything, but I guess I was able
to solve this using something near your suggestion (thanks A LOT for that,
BTW). However, although I am satisfied with the result by now, my
application logic will be really complex to be able to create these queries
dynamically and the I am guessing the performance could be really better
than it will probably be. I will have at most 50 queries like this running
a day, so I am not really worried about it now, but I wonder if a new
feature on Elastic Search would help on this...
Do you guys think Elastic Search 1.0 will solve this kind of problem,
letting me to write custom group by and aggregate functions?
Do you think it would make sense to create an issue asking min_score
inside bool queries (should, must and must not clauses) and give a reward
to it to contribute to current version?
And if I had a max_score as well, I wouldn't need the function score,
so I guess the performance would improve even more, right? Same question
applies...

And Thanks A LOT for the help!!!

{
"min_score": 1,
"query": {
"bool": {
"should": [
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"constant_score": {
"query": {
"match": {
"text": "Finally!"
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 1
},
"script": "(_score > 0) && (_score < cutoff) ? 0 :2"
},
"boost_mode": "replace"
}
},
{ "bool": {
"must": [
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"constant_score": {
"query": {
"match": {
"text": "Finally!"
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 1
},
"script": "(_score > 0) && (_score < cutoff) ? 0 :1"
},
"boost_mode": "replace"
}
},
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"filtered": {
"filter": {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 4
},
"script": "(_score > 0) && (_score < cutoff) ? 0 :1"
},
"boost_mode": "replace"
}
}
]
}}
]
}
}
}

Best regards,
Marcelo Valle.

2013/10/20 Clinton Gormley clint@traveljury.com

Damn, I knew I should have run that query before posting it :slight_smile:

So the score of 0.577 that you got in your simple test query is the result
of "normalization". Try running it with ?explain=1 to see the details of
how that score is derived. Essentially it is (-1+1+1)/3 == 1/3 (where 3
is the number of clauses), which then gets "normalized", to return 0.577.

The second query as I posted it will not work as you want. I'll think
about it a bit more and see if I can come up with something that does work,
but it is starting to look a lot more complicated :slight_smile:

clint

On 18 October 2013 21:25, Marcelo Elias Del Valle mvallebr@gmail.comwrote:

Hello,

Just to increase clarification about the last e-mail about this

subject, shouldn't the bellow query return no results? I am still confused
with the way the scoring works on bool queries.

{
"min_score": 0,
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": "-1"
},
"boost_mode": "mult"
}
},
{
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": "1"
},
"boost_mode": "mult"
}
},
{
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": "1"
},
"boost_mode": "mult"
}
}
]
}
}
}

{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.57735026,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "3",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 3",
"content": "several test comments :D"
}
},
{
"_index": "documents",
"_type": "document",
"_id": "4",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 4",
"content": "several test comments :D"
}
},
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
},
{
"_index": "documents",
"_type": "document",
"_id": "2",
"_score": 0.57735026,
"_source": {
"title": "Parent document with comments 2",
"content": "several test comments :D"
}
}
]
}
}

Best regards,
Marcelo Valle.

2013/10/18 Marcelo Elias Del Valle mvallebr@gmail.com

Clinton,

I ran your second query (two.json -

https://gist.github.com/clintongormley/7044466), which BTW is exactly
what I was trying to do, but using a cutoff of 15 in the first has_child
query (text matches finally) using the data and mapping I sent you in the
last e-mail, it returned a positive score, while I was excepting it to
return nothing... Am I missing something? I am using ES 0.90.5 on a Debian
Wheezy 64 Machine...

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 3,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 3,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
}
]
}
}

Just to check, I ran

{
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"constant_score": {
"query": {
"match": {
"text": "Finally!"
}
}
}
}
}
}
}

and the returned score was 1:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 1,
"_source": {
"title": "Parent document with comments 1",
"content": "several test comments :D"
}
}
]
}
}

Here it is the full query I used, exactly like yours, but changing
cutoff to 15:
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"constant_score": {
"query": {
"match": {
"text": "Finally!"
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 15
},
"script": "(_score > 0) && (_score < cutoff) ? -1
:1"
},
"boost_mode": "mult"
}
},
{
"function_score": {
"query": {
"has_child": {
"type": "comment",
"score_type": "sum",
"boost": 1,
"query": {
"filtered": {
"filter": {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
}
},
"boost": "1",
"script_score": {
"params": {
"cutoff": 4
},
"script": "(_score > 0) && (_score < cutoff) ? -1
:1"
},
"boost_mode": "mult"
}
}
]
}
}
}

Best regards,
Marcelo Valle.

2013/10/18 Clinton Gormley clint@traveljury.com

Hi Marcelo

On 18 October 2013 17:28, Marcelo Elias Del Valle mvallebr@gmail.comwrote:

Even without using function score, it does what I want, but

function score is really more flexible because I can have a condition such
as (N1 < score < N2). But even if this solves my problem for 1 has_child
query, what to do when I have multiple has_child queries? each function
score would have boost_mode = mult, right? But wouldn't that mean that if
the two funct wilion scores returned -1 I would have a positive number as
final result?

In this case, change the script to: "(_score > 0 ) && (_score < cutoff)
? -1 : 1"

Btw, you should use parameters (like cutoff) rather than putting those
values directly into the script, otherwise every time you change the cutoff
value Elasticsearch has to recompile the script.

I didn't properly understand how the bool query will get the score

from the function_score queries and how it will be combined. For instance,
if I use should, one of the two scores should be greater than 0, but if I
use must, both scores should be greater than 0... I was guessing my need to
to have min_score inside the must clause, not in the top level.

The _score is separate from whether a document is included in the
results or not. Only the min_score filter (applied at the end of the query)
allows you to filter out results based on score.

I've written up two example queries based on your data, which do
slightly different things. The first says: give me parents which have
more than $cutoff children, where each child is in this date range and
includes the word "finally"

https://gist.github.com/clintongormley/7044461

The second says: give me parents which have more than $cutoff children
within this date range, and more than $cutoff children which contain the
word "finally" (ie the two clauses do not have to be true in the same
children.

https://gist.github.com/clintongormley/7044466

btw, running multiple has_children queries is likely to be slow. i'd
reconsider your requirements :slight_smile:

hth

clint

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qLXupHz0PKo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #15