Query for missing fields using a score


(Amy) #1

Hi,
I had a query to search for certain fields which was scored using a custom
script. Then I needed to take account of missing data and found that I'd
need to use a filter for this. The solution query (simplified) is:
{
"query": {
"custom_filters_score": {
"query": {
"match_all": {}
},
"filters": [
{
"script": "((doc['id'].value == 'hi') ? 2 : 0) + 20",
"filter": {
"bool": {
"should": [
{
"term": {
"id": "hello"
}
},
{
"term": {
"id": "hi"
}
},
{
"missing": {
"field": "jurisdiction",
"existence": true,
"null_value": true
}
}
]
}
}
}
]
}
},
"size": 30
}

My problem is that the 'match_all' query will return all the data,
whereas I only want data matched by my filter to be returned. By doctoring
the scoring script I can ensure that filtered data is top of the list, but
I'm worried about performance.

My original query returned only data matching my terms, but I was unable
to use missing data with that. The original query is:
{
"query": {
"custom_score": {
"query": {
"bool": {
"should": [
{
"term": {
"id": "hi"
}
},
{
"term": {
"id": "hello"
}
}
]
}
},
"script": "((doc['id'].value == 'hi') ? 2 : 0)"
}
}
}

Please note that this is a simplified query, I need the script rather than
using a boost.

Is there a better solution?

--


(Clinton Gormley) #2

Hi Amy

I had a query to search for certain fields which was scored using a
custom script. Then I needed to take account of missing data and found
that I'd need to use a filter for this. The solution query
(simplified) is:

My problem is that the 'match_all' query will return all the data,
whereas I only want data matched by my filter to be returned. By
doctoring the scoring script I can ensure that filtered data is top of
the list, but I'm worried about performance.

My original query returned only data matching my terms, but I was
unable to use missing data with that. The original query is:

I'm unclear from your examples exactly what your query should do. Could
you explain in english what you want to achieve?

ta

clint

{
"query": {
"custom_score": {
"query": {
"bool": {
"should": [
{
"term": {
"id": "hi"
}
},
{
"term": {
"id": "hello"
}
}
]
}
},
"script": "((doc['id'].value == 'hi') ? 2 : 0)"
}
}
}

Please note that this is a simplified query, I need the script rather
than using a boost.

Is there a better solution?

--

--


(Amy) #3

Hi,
To clarify, in this example I want to return all documents where the term
"id" is either "hi" or "hello" or is missing altogether.
In the query I provided (without the filter) documents without the term
"id" were not returned, which was not what I wanted.
In the query that had the filter all documents were returned, which was
also not what I wanted, but I was able to bring the documents I wanted to
the top of the search by adding 20 to the score in the filter. This is not
ideal, and I'm worried about performance, even if I were to change the size
returned to 1.
Does that explain my predicament?

On Thursday, 11 October 2012 09:10:36 UTC+1, Clinton Gormley wrote:

Hi Amy

I had a query to search for certain fields which was scored using a
custom script. Then I needed to take account of missing data and found
that I'd need to use a filter for this. The solution query
(simplified) is:

My problem is that the 'match_all' query will return all the data,
whereas I only want data matched by my filter to be returned. By
doctoring the scoring script I can ensure that filtered data is top of
the list, but I'm worried about performance.

My original query returned only data matching my terms, but I was
unable to use missing data with that. The original query is:

I'm unclear from your examples exactly what your query should do. Could
you explain in english what you want to achieve?

ta

clint

{
"query": {
"custom_score": {
"query": {
"bool": {
"should": [
{
"term": {
"id": "hi"
}
},
{
"term": {
"id": "hello"
}
}
]
}
},
"script": "((doc['id'].value == 'hi') ? 2 : 0)"
}
}
}

Please note that this is a simplified query, I need the script rather
than using a boost.

Is there a better solution?

--

--


(Clinton Gormley) #4

Hi Amy

To clarify, in this example I want to return all documents where the
term "id" is either "hi" or "hello" or is missing altogether.
In the query I provided (without the filter) documents without the
term "id" were not returned, which was not what I wanted.
In the query that had the filter all documents were returned, which
was also not what I wanted, but I was able to bring the documents I
wanted to the top of the search by adding 20 to the score in the
filter. This is not ideal, and I'm worried about performance, even if
I were to change the size returned to 1.
Does that explain my predicament?

Yes - much clearer :slight_smile:

The exact structure of the query depends on what else you want to do,
but based on the description above, you could do this:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d '
{
"query" : {
"custom_score" : {
"script" : "some script goes here",
"query" : {
"constant_score" : {
"filter" : {
"or" : [
{ "terms" : { "id" : [ "hi", "hello" ] }},
{ "missing" : { "field" : "id" }}
]
}
}
}
}
}
}
'

clint

--


(Amy) #5

Thank you Clint, that was exactly what I was looking for!

On Thursday, October 11, 2012 2:10:22 PM UTC+1, Clinton Gormley wrote:

Hi Amy

To clarify, in this example I want to return all documents where the
term "id" is either "hi" or "hello" or is missing altogether.
In the query I provided (without the filter) documents without the
term "id" were not returned, which was not what I wanted.
In the query that had the filter all documents were returned, which
was also not what I wanted, but I was able to bring the documents I
wanted to the top of the search by adding 20 to the score in the
filter. This is not ideal, and I'm worried about performance, even if
I were to change the size returned to 1.
Does that explain my predicament?

Yes - much clearer :slight_smile:

The exact structure of the query depends on what else you want to do,
but based on the description above, you could do this:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d '
{
"query" : {
"custom_score" : {
"script" : "some script goes here",
"query" : {
"constant_score" : {
"filter" : {
"or" : [
{ "terms" : { "id" : [ "hi", "hello" ] }},
{ "missing" : { "field" : "id" }}
]
}
}
}
}
}
}
'

clint

--


(system) #6