How to calculate score by dividing number of occurences per field length?

I would like to implement my own scoring algorithm using script_score. It should to take count of occurrences of searched words and divide this number by field length. But I don't know how to gather those two variables from Elasticsearch engine at the same time. I tried to do something like this on mapping side:

"settings": {
        "similarity": {
            "my_similarity": {
                "type": "scripted",
                "script": {
                    "source": "return doc.freq/doc.length;"
                }
            },
}

but the problem is that doc.length is not the length of a field. I also tried to calculate it on query side:

"query": {
    "function_score": {
        "script_score": {
            "script": {
                "lang": "painless",
                "inline": "freq/doc['some_field'].length;"
            }
        }
    }
}

but here I can't get freq variable. Is there any way to compute it somehow?

Maybe it is possible to return doc.freq as a defined field by some mapping configuration, and calculate it on query side?

You may want to check out the scripted similarity instead of a function score query in order to implement this.

Yes, I checked it already out, but I think I'm not able to achieve what I want using scripted similarity API. As I mentioned in my previous post the problem is doc.length property, which seems to be not a field length. This single property returns different values for the same document field when I use different queries.

have you used explain: true in the search request to check for different doc.length? can you provide a fully reproducible example with the similarity, index creation, mapping and the queries returning different length (which might be due to the mapping)?

This is my mapping, where I override score script with single property doc.length :

    {
        "mappings": {
            "some_info": {
                "_all": {
                    "enabled": "false"
                },
                "properties": {
                    "some_text_field": {
                        "type": "text",
                        "index": "false",
                        "fields": {
                            "ngram": {
                                "type": "text",
                                "analyzer": "custom_nGram_analyzer",
                                "similarity": "my_similarity"
                            }
                        }
                    }
                }
            }
        },
        "settings": {
            "analysis": {
                "analyzer": {
                    "custom_nGram_analyzer": {
                        "type": "custom",
                        "tokenizer": "whitespace",
                        "filter": [
                            "lowercase",
                            "asciifolding",
                            "custom_nGram_filter"
                        ]
                    }
                },
                "filter": {
                    "custom_nGram_filter": {
                        "type": "ngram",
                        "token_chars": [],
                        "min_gram": 3,
                        "max_gram": 16
                    }
                },
                "normalizer": {
                    "custom_lowercase_normalizer": {
                        "type": "custom",
                        "filter": [
                            "lowercase"
                        ]
                    }
                }
            },
            "similarity": {
                "my_similarity": {
                    "type": "scripted",
                    "script": {
                        "source": "return doc.length;"
                    }
                }
            }
        }
    }
data: POST: /some_index/some_info
{
    "some_text_field": "Lorem Ipsum is simply ...."
}

when I perform query: GET /some_index/_search?q=Lorem
I get score: 76.0
but when I do: GET /some_index/_search?q=Lorem Ipsum
I get score: 152.0

_explaination shows something like this:

    "_explanation": {
        "value": 152.0,
        "description": "max of:",
        "details": [
            {
                "value": 152.0,
                "description": "sum of:",
                "details": [
                    {
                        "value": 76.0,
                        "description": "weight(Synonym(some_text_field.ngram:lor some_text_field.ngram:lore some_text_field.ngram:lorem some_text_field.ngram:ore some_text_field.ngram:orem some_text_field.ngram:rem) in 0) [PerFieldSimilarity], result of:",
                        "details": [
                            {
                                "value": 76.0,
                                "description": "score from ScriptedSimilarity(weightScript=[null], script=[Script{type=inline, lang='painless', idOrCode='return doc.length;', options={}, params={}}]) computed from:",
                                "details": [
                                    ...
                                    {
                                        "value": 76.0,
                                        "description": "doc.length",
                                        "details": []
                                    }
                                ]
                            }
                        ]
                    },
                    {
                        "value": 76.0,
                        "description": "weight(Synonym(some_text_field.ngram:ips some_text_field.ngram:ipsu some_text_field.ngram:ipsum some_text_field.ngram:psu some_text_field.ngram:psum some_text_field.ngram:sum) in 0) [PerFieldSimilarity], result of:",
                        "details": [
                            {
                                "value": 76.0,
                                "description": "score from ScriptedSimilarity(weightScript=[null], script=[Script{type=inline, lang='painless', idOrCode='return doc.length;', options={}, params={}}]) computed from:",
                                "details": [
                                        ...
                                    {
                                        "value": 76.0,
                                        "description": "doc.length",
                                        "details": []
                                    }
                                ]
                            }
                        ]
                    }
                ]
            }
        ]

Is there any way to get doc.length without summing it?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.