The fuzzier matching, the higher score?

Jevgenij · December 24, 2018, 3:35pm

Hello!

My template is:

        ...,
        "code": {
          "type": "keyword",
          "copy_to": "full_text"
        },
        ...,
        "full_text": {
          "type": "text"
        }

My query is:

{
    "bool": {
        "must": {
            "match": {
                "full_text": {
                    "query": "AD2480ME",
                    "operator": "and",
                    "fuzziness": "AUTO"
                }
            }
        }
    }
}

And response is:

        {
            ...,
            "_score": 2.7549238,
            "_source": {
                "code": "AD3440ME",
                ....
            }
        },
        {
            ...,
            "_score": 2.7438653,
            "_source": {
                "code": "AD2480ME",
                ...
            }
        }

So the question is why the exact matching record has lower score than fuzzy one?

Mark_Harwood · December 24, 2018, 4:02pm

The explain API can help diagnose what's going on. I expect it may be to do with IDF (rarity) of the terms.
What version of elasticsearch are you running and how many shards/docs per shard do you have?

Jevgenij · December 25, 2018, 8:22pm

Elasticsearch version 6.5.2

Actually I've met the issue on my development dataset:

$ curl -XGET 'http://localhost:9200/my_index/_count'

{
    "count": 18,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    }
}

Mentioned code property is unique through index docs.

Not sure what do you mean by

But Explain API has brought unexpected results.

First of all I did:

GET /my_index/_search

{
    "query": {
        "bool": {
            "must": {
                "match": {
                    "full_text": {
                        "query": "AD2480ME",
                        "operator": "and",
                        "fuzziness": "AUTO"
                    }
                }
            }
        }
    }
}

And got:

{
    "took": 22,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.7549239,
        "hits": [
            {
                "_index": "my_index",
                "_type": "doc",
                "_id": "31",
                "_score": 0.7549239,
                "_source": {
                    "code": "AD3440ME"
                }
            },
            {
                "_index": "my_index",
                "_type": "doc",
                "_id": "22",
                "_score": 0.7438652,
                "_source": {
                    "code": "AD2480ME"
                }
            }
        ]
    }
}

So the next thing I did was:

GET /my_index/_default_/31/_explain

and

GET /my_index/_default_/22/_explain

both with

{
    "query": {
        "bool": {
            "must": {
                "match": {
                    "full_text": {
                        "query": "AD2480ME",
                        "operator": "and",
                        "fuzziness": "AUTO"
                    }
                }
            }
        }
    }
}

And both returned me the same

{
    "_index": "my_index",
    "_type": "_default_",
    "_id": "22",
    "matched": false,
    "explanation": {
        "value": 0,
        "description": "Failure to meet condition(s) of required/prohibited clause(s)",
        "details": []
    }
}

with a lot of different details about score counting, but with the same "matched": false.

So, if I understand correctly, the exactly matching doc is not being seen as matching one.

Mark_Harwood · December 25, 2018, 11:51pm

Use “doc” not “_default” here

Mark_Harwood · December 26, 2018, 10:10am

18 docs in 5 shards will have some funky scoring. The number of docs in each will be very uneven (maybe varying by 25%) so the IDF score of a unique term will be affected.

Choices are:

Add more docs
Use one shard or
Use “DFS” style search

Jevgenij · December 26, 2018, 11:39am

Thank you for your time, attention to detail, corrections and right URLs to read!

As soon as I switched the local development elasticsearch instance to one shard, the funky scoring disappeared and everything fell into place!

system · January 23, 2019, 11:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch: how the exact matches can rank higher than fuzzy match and phrase match in the elastic search? Elasticsearch	1	927	November 25, 2022
Fuzziness & score computation Elasticsearch	2	5900	July 6, 2017
Elasticsearch fuzzy search scores the same for exact match and non-exact match Elasticsearch	1	512	July 5, 2017
Help to understand fuzzy score Elasticsearch	3	15	November 21, 2024
Index boosting and how the exact matches can rank higher than fuzzy match and phrase match in the elastic search? Elasticsearch	1	648	November 28, 2022

The fuzzier matching, the higher score?

Related topics