I have an index whose documents have a field firstName
. If I run the following, for example:
GET _search/
{
"query": {
"fuzzy": {
"firstName": {
"value": "yvonne",
"fuzziness": 1
}
}
}
}
I get 5596 hits. Now if I stick the fuzzy term inside a bool must clause:
GET _search/
{
"query": {
"bool": {
"must": [
{
"fuzzy": {
"firstName": {
"value": "yvonne",
"fuzziness": 1
}
}
}
]
}
}
}
I still get 5596. And if I change the must to a filter clause:
GET _search/
{
"query": {
"bool": {
"filter": [
{
"fuzzy": {
"firstName": {
"value": "yvonne",
"fuzziness": 1
}
}
}
]
}
}
}
Same, 5596 again.
Ok, now I change fuzziness
to 2 instead of 1. Running the simple fuzzy term query again:
GET _search/
{
"query": {
"fuzzy": {
"firstName": {
"value": "yvonne",
"fuzziness": 2
}
}
}
}
Now I get 6079 hits. Larger edit distance should match more documents, seems reasonable. Now stick that inside a bool query as a must clause again:
GET _search/
{
"query": {
"bool": {
"must": [
{
"fuzzy": {
"firstName": {
"value": "yvonne",
"fuzziness": 2
}
}
}
]
}
}
}
Still 6079. Now change the must clause to a filter:
GET _search/
{
"query": {
"bool": {
"filter": [
{
"fuzzy": {
"firstName": {
"value": "yvonne",
"fuzziness": 2
}
}
}
]
}
}
}
This returns 7980 hits.
As I understand it, the sole difference between must and filter clauses in a bool query is whether hits are scored or not. But this doesn't seem to be true; running the fuzzy query in a filter context seems to be making the query less selective. I can't seem to find anything in the Elastic docs to explain this behavior. What am I missing? Thanks in advance.