Getting different results while using bool query vs bool query with function score query


(akshaysh) #1

I am trying to add a custom boost to the different should clauses in the
bool query, but I am getting different number of results when I use the
bool query with 2 should clauses containing 2 simple query string query vs
a bool query with 2 should clauses with 2 function score query
encapsulating the same simple query string queries.
The following query returns me 2 results for my data set:
{
"query" : {
"filtered" : {
"query" : {
"bool" : {
"should" : [ {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple" ]
}
}, {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple_with_numeric" ]
}
} ]
}
},
"filter" : {
"bool" : {
"must" : [ {
"term" : {
"securityInfo.securityType" : "open"
}
}, {
"bool" : {
"must" : [ {
"term" : {
"sourceId.sourceSystem" : "jmeter_007971_numeric"
}
}, {
"term" : {
"sourceId.type" : "file"
}
} ]
}
} ],
"_cache" : true
}
}
}
},
"fields" : [ "elementId", "sourceId.id", "sourceId.type",
"sourceId.sourceSystem", "sourceVersion", "content.name_enu" ]
}

Where as if I use the following query I get 5 results, same simple query
strings but with function scores:
{
"query" : {
"filtered" : {
"query" : {
"bool" : {
"should" : [ {
"function_score" : {
"query" : {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple" ]
}
},
"boost_factor" : 1.5
}
}, {
"function_score" : {
"query" : {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple_with_numeric" ]
}
},
"boost_factor" : 2.5
}
} ]
}
},
"filter" : {
"bool" : {
"must" : [ {
"term" : {
"securityInfo.securityType" : "open"
}
}, {
"bool" : {
"must" : [ {
"term" : {
"sourceId.sourceSystem" : "jmeter_007971_numeric"
}
}, {
"term" : {
"sourceId.type" : "file"
}
} ]
}
} ],
"_cache" : true
}
}
}
},
"fields" : [ "elementId", "sourceId.id", "sourceId.type",
"sourceId.sourceSystem", "sourceVersion", "content.name_enu" ]
}

From my understanding of how the should clause works I was expecting both
the queries to return 5 results but I am not able to understand why the 1st
query returns me 2 results for my data set. The "content.name_enu.simple"
uses a simple analyzer, whereas simple_with_numeric uses whitespace
tokenizer and lowercase filter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e31e1c7-8b07-4220-abc9-c520d681495a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

The function score should not affect relevancy, only the scoring, so the
number of results should not differ. Strange.

Perhaps you do not need to use a function score. With the simple query
string, you can append the boost parameter to the field name:

"simple_query_string": {
"query": "128",
"fields": [
"content.name_enu.simple^1.5"
]
}

Since your example query is just a simple term and not a Lucene query, you
should probably use a match query, which is a boostable query.

Cheers,

Ivan

On Tue, Aug 26, 2014 at 4:15 PM, Akshay Shukla akshayshukla.as@gmail.com
wrote:

I am trying to add a custom boost to the different should clauses in the
bool query, but I am getting different number of results when I use the
bool query with 2 should clauses containing 2 simple query string query vs
a bool query with 2 should clauses with 2 function score query
encapsulating the same simple query string queries.
The following query returns me 2 results for my data set:
{
"query" : {
"filtered" : {
"query" : {
"bool" : {
"should" : [ {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple" ]
}
}, {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple_with_numeric" ]
}
} ]
}
},
"filter" : {
"bool" : {
"must" : [ {
"term" : {
"securityInfo.securityType" : "open"
}
}, {
"bool" : {
"must" : [ {
"term" : {
"sourceId.sourceSystem" : "jmeter_007971_numeric"
}
}, {
"term" : {
"sourceId.type" : "file"
}
} ]
}
} ],
"_cache" : true
}
}
}
},
"fields" : [ "elementId", "sourceId.id", "sourceId.type",
"sourceId.sourceSystem", "sourceVersion", "content.name_enu" ]
}

Where as if I use the following query I get 5 results, same simple query
strings but with function scores:
{
"query" : {
"filtered" : {
"query" : {
"bool" : {
"should" : [ {
"function_score" : {
"query" : {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple" ]
}
},
"boost_factor" : 1.5
}
}, {
"function_score" : {
"query" : {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple_with_numeric" ]
}
},
"boost_factor" : 2.5
}
} ]
}
},
"filter" : {
"bool" : {
"must" : [ {
"term" : {
"securityInfo.securityType" : "open"
}
}, {
"bool" : {
"must" : [ {
"term" : {
"sourceId.sourceSystem" : "jmeter_007971_numeric"
}
}, {
"term" : {
"sourceId.type" : "file"
}
} ]
}
} ],
"_cache" : true
}
}
}
},
"fields" : [ "elementId", "sourceId.id", "sourceId.type",
"sourceId.sourceSystem", "sourceVersion", "content.name_enu" ]
}

From my understanding of how the should clause works I was expecting both
the queries to return 5 results but I am not able to understand why the 1st
query returns me 2 results for my data set. The "content.name_enu.simple"
uses a simple analyzer, whereas simple_with_numeric uses whitespace
tokenizer and lowercase filter

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0e31e1c7-8b07-4220-abc9-c520d681495a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0e31e1c7-8b07-4220-abc9-c520d681495a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAEJyMOonB38jqQiWQ_17mU%3DGSdkUqz0ctQ6OR8yywoWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #3

Forgot to add that since your search term is the same, besides using a
match query, you can also use a multi match query. Your queries would be
easier to read.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html

Not sure why your original query is not working. If you post some example
documents and mapping, but others might be able to figure it out.

--
Ivan

On Wed, Aug 27, 2014 at 11:07 AM, Ivan Brusic ivan@brusic.com wrote:

The function score should not affect relevancy, only the scoring, so the
number of results should not differ. Strange.

Perhaps you do not need to use a function score. With the simple query
string, you can append the boost parameter to the field name:

"simple_query_string": {
"query": "128",
"fields": [
"content.name_enu.simple^1.5"
]
}

Since your example query is just a simple term and not a Lucene query, you
should probably use a match query, which is a boostable query.

Cheers,

Ivan

On Tue, Aug 26, 2014 at 4:15 PM, Akshay Shukla akshayshukla.as@gmail.com
wrote:

I am trying to add a custom boost to the different should clauses in the
bool query, but I am getting different number of results when I use the
bool query with 2 should clauses containing 2 simple query string query vs
a bool query with 2 should clauses with 2 function score query
encapsulating the same simple query string queries.
The following query returns me 2 results for my data set:
{
"query" : {
"filtered" : {
"query" : {
"bool" : {
"should" : [ {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple" ]
}
}, {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple_with_numeric" ]
}
} ]
}
},
"filter" : {
"bool" : {
"must" : [ {
"term" : {
"securityInfo.securityType" : "open"
}
}, {
"bool" : {
"must" : [ {
"term" : {
"sourceId.sourceSystem" : "jmeter_007971_numeric"
}
}, {
"term" : {
"sourceId.type" : "file"
}
} ]
}
} ],
"_cache" : true
}
}
}
},
"fields" : [ "elementId", "sourceId.id", "sourceId.type",
"sourceId.sourceSystem", "sourceVersion", "content.name_enu" ]
}

Where as if I use the following query I get 5 results, same simple query
strings but with function scores:
{
"query" : {
"filtered" : {
"query" : {
"bool" : {
"should" : [ {
"function_score" : {
"query" : {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple" ]
}
},
"boost_factor" : 1.5
}
}, {
"function_score" : {
"query" : {
"simple_query_string" : {
"query" : "128",
"fields" : [ "content.name_enu.simple_with_numeric" ]
}
},
"boost_factor" : 2.5
}
} ]
}
},
"filter" : {
"bool" : {
"must" : [ {
"term" : {
"securityInfo.securityType" : "open"
}
}, {
"bool" : {
"must" : [ {
"term" : {
"sourceId.sourceSystem" : "jmeter_007971_numeric"
}
}, {
"term" : {
"sourceId.type" : "file"
}
} ]
}
} ],
"_cache" : true
}
}
}
},
"fields" : [ "elementId", "sourceId.id", "sourceId.type",
"sourceId.sourceSystem", "sourceVersion", "content.name_enu" ]
}

From my understanding of how the should clause works I was expecting both
the queries to return 5 results but I am not able to understand why the 1st
query returns me 2 results for my data set. The "content.name_enu.simple"
uses a simple analyzer, whereas simple_with_numeric uses whitespace
tokenizer and lowercase filter

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0e31e1c7-8b07-4220-abc9-c520d681495a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0e31e1c7-8b07-4220-abc9-c520d681495a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBM2KityjY%2BS2FtqHEtVuJKfokm6Z15gy8_VgXu4h2mLw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4