I want to implement my own scoring algorithm for a use case that requires finding one distinct document from a query result set. This requires access to two key pieces of information:
- The number of query terms as analyzed by ES
- The number of analyzed terms in the document's field
I do not need access to the inverted index. I don't care what the terms are during scoring. All that's needed are the number of terms as stated above.
Consider a simple example set with the following analyzed terms:
doc1 = ["pizza"]
doc2 = ["cheese", "pizza"]
doc3 = ["large", "cheese", "pizza"]
doc4 = ["small", "cheese", "pizza"]
My desired scoring function could not be simpler once documents match full-text terms with an AND operation:
score = # analyzed query terms / # analyzed field terms
Consider these queries to find distinct matches above:
-
I want to find doc1. The distinct query is "pizza" with the highest score of 1.
-
I want to find doc2. The distinct query is "cheese pizza" with the highest score of 2/3.
-
I want to find doc3. The distinct query is "large", "large pizza", "large cheese" or "large cheese pizza" with highest scores of 1/3, 2/3, 2/3 and 3/3, respectively.
-
I want to find doc4. The distinct query is "small", "small cheese", "small pizza" or "small cheese pizza" with the highest scores of 1/3, 2/3, 2/3, and 3/3, respectively.
I'm a novice wrt painless scripting. Can I access the key parameters above per document to perform my simple scoring algorithm? And how do I write this scoring function?
Below are sample mappings, documents, an desired query, waiting for my custom scoring function.
Thanks!
PUT food
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
PUT food/_doc/1
{
"name": "pizza"
}
PUT food/_doc/2
{
"name": "cheese pizza"
}
PUT food/_doc/3
{
"name": "large cheese pizza"
}
PUT food/_doc/4
{
"name": "small cheese pizza"
}
GET food/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "cheese pizza",
"operator": "and",
"fuzziness": "AUTO",
"prefix_length": 2
}
}
}
]
}
}
}