Identical documents have different scores when using fuzziness


(Burrito) #1

I'm having trouble understanding why identical documents will have
different scores on occasion. This gist sh file creates an index and
adds 4 documents. The single string property has the same value
"crime" for 3 of the 4 documents.

This normally occurs without fuzziness as well, but can be managed by
setting a dfs search_type.

Why will identical documents have different scores when using
fuzziness? Does the distributed idf not work when fuzziness is
applied?

Any help is appreciated.

(NOTE: You may have to run the script several times)

example response 1

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.8574929,
"hits" : [ {
"_index" : "movies",
"_type" : "actors",
"_id" : "-4RFpx0QSO-qGnA6jvyxoQ",
"_score" : 0.8574929, "_source" : {"actor" : "crime"}
}, {
"_index" : "movies",
"_type" : "actors",
"_id" : "fzt7bXKzRFS1Jll7rwYbOQ",
"_score" : 0.5144958, "_source" : {"actor" : "crimeq"}
}, {
"_index" : "movies",
"_type" : "actors",
"_id" : "NE0o8i7USvCCXwQjLowmfg",
"_score" : 0.30685282, "_source" : {"actor" : "crime"}
}, {
"_index" : "movies",
"_type" : "actors",
"_id" : "epcgMCtVTaiF0uyY475X9Q",
"_score" : 0.30685282, "_source" : {"actor" : "crime"}
} ]
}
}

example response 2

{
"hits" : {
"total" : 4,
"max_score" : 1.2066344,
"hits" : [ {
"_index" : "movies",
"_type" : "actors",
"_id" : "0kTpg2KmRLuPQ1balKCUwQ",
"_score" : 1.2066344, "_source" : {"actor" : "crimeq"}
}, {
"_index" : "movies",
"_type" : "actors",
"_id" : "GkPtql3_SJGhl3vu0ea27g",
"_score" : 1.0, "_source" : {"actor" : "crime"}
}, {
"_index" : "movies",
"_type" : "actors",
"_id" : "zBkRROcvTUedKHBG_shT2g",
"_score" : 0.70151186, "_source" : {"actor" : "crime"}
}, {
"_index" : "movies",
"_type" : "actors",
"_id" : "jsp4klQ8T4yBtRQ6Nn13lA",
"_score" : 0.70151186, "_source" : {"actor" : "crime"}
} ]
}
}


(system) #2