Is it possible to calculate relative score from 0 to 1 when searching similar documents to existing one?
Need to calculate relative score from 0 to 1 when searching similar documents to existing one. So existing one has score 1, and all other matching documents scores should be calculated according to this]. But existing document should be excluded from the search. Is it possible to do it on elasticsearch side, not just calculating score manually in a programming language like: match_doc_score/search_doc_score
Let's imagine we have index person
with mapping:
{
"properties": {
"person_id": {
"type": "keyword"
},
"fullname": {
"type": "text"
},
"email": {
"type": "keyword"
},
"phone": {
"type": "keyword"
},
"country_of_birth": {
"type": "keyword"
}
}
}
And I have 3 persons inside the index:
Person 1:
{
"person_id": 1,
"fullname": "John Snow",
"email": "john@gmail.com",
"phone": "111-11-11",
"country_of_birth": "Denmark"
}
Person 2:
{
"person_id": 2,
"fullname": "Snow John",
"email": "john@gmail.com",
"phone": "222-22-22",
"country_of_birth": "Denmark"
}
Person 3:
{
"person_id": 3,
"fullname": "Peter Wislow",
"email": "peter@gmail.com",
"phone": "111-11-11",
"country_of_birth": "Denmark"
}
We find persons that are similar to Person 1 by this query:
{
"query": {
"bool": {
"should": [
{
"match": {
"fullname": {
"query": "John Snow",
"boost": 6
}
}
},
{
"term": {
"email": {
"value": "john@gmail.com",
"boost": 5
}
}
},
{
"term": {
"phone": {
"value": "111-11-11",
"boost": 4
}
}
},
{
"term": {
"country_of_birth": {
"value": "Denmark",
"boost": 2
}
}
}
],
"must_not": [
{
"term": {
"person_id": 123
}
}
]
}
}
}
As you can see:
- person 1 and person 2 match by: fullname, email, country of birth.
- person 1 and person 3 match by: phone, country of birth.
Is it possible to have 0..1 scoring if we have a document with full match in the index(person 1)?
I know there is a more_like_this query, but in real life search queries can be complicated so more_like_this
is not a good option. Even elasticsearch documentation says that if you need more control over the query, then use boolean query combinations.