# I need help writing a simple custom scoring function

I want to implement my own scoring algorithm for a use case that requires finding one distinct document from a query result set. This requires access to two key pieces of information:

1. The number of query terms as analyzed by ES
2. The number of analyzed terms in the document's field

I do not need access to the inverted index. I don't care what the terms are during scoring. All that's needed are the number of terms as stated above.

Consider a simple example set with the following analyzed terms:

doc1 = ["pizza"]
doc2 = ["cheese", "pizza"]
doc3 = ["large", "cheese", "pizza"]
doc4 = ["small", "cheese", "pizza"]

My desired scoring function could not be simpler once documents match full-text terms with an AND operation:

score = # analyzed query terms / # analyzed field terms

Consider these queries to find distinct matches above:

1. I want to find doc1. The distinct query is "pizza" with the highest score of 1.

2. I want to find doc2. The distinct query is "cheese pizza" with the highest score of 2/3.

3. I want to find doc3. The distinct query is "large", "large pizza", "large cheese" or "large cheese pizza" with highest scores of 1/3, 2/3, 2/3 and 3/3, respectively.

4. I want to find doc4. The distinct query is "small", "small cheese", "small pizza" or "small cheese pizza" with the highest scores of 1/3, 2/3, 2/3, and 3/3, respectively.

I'm a novice wrt painless scripting. Can I access the key parameters above per document to perform my simple scoring algorithm? And how do I write this scoring function?

Below are sample mappings, documents, an desired query, waiting for my custom scoring function.

Thanks!

``````PUT food
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
PUT food/_doc/1
{
"name": "pizza"
}
PUT food/_doc/2
{
"name": "cheese pizza"
}
PUT food/_doc/3
{
"name": "large cheese pizza"
}
PUT food/_doc/4
{
"name": "small cheese pizza"
}

GET food/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "cheese pizza",
"operator": "and",
"fuzziness": "AUTO",
"prefix_length": 2
}
}
}
]
}
}
}
``````

UPDATE: I found it's possible to store a field containing the number of terms analyzed (below). But how do I compute the number of terms of a param query as full text with the same analyzer in a painless script?

``````PUT food
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "english",
"store":true,
"fields": {
"length": {
"type": "token_count",
"analyzer": "english"
}
}
}
}
}
}
}
``````

Then this query returns that field:

``````GET food/_search
{
"_source": "*",
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "cheese pizza",
"operator": "and",
"fuzziness": "AUTO",
"prefix_length": 2
}
}
}
]
}
},
"script_fields": {
"name_term_count": {
"script": {
"lang": "painless",
"source": "doc['name.length']"
}
}
}
}
``````

Example hit:

``````  {
"_index": "food",
"_type": "_doc",
"_id": "3",
"_score": 0.5753642,
"_source": {
"name": "large cheese pizza"
},
"fields": {
"name_term_count": [
3
]
}
}``````

Of course I can run this first, but incurs round-trip network call just to obtain the query term count:

``````GET food/_analyze
{
"field": "name",
"text": "cheese pizza large"
}
``````

result:

``````{ tokens:
[ { token: 'larg',
start_offset: 0,
end_offset: 5,
type: '<ALPHANUM>',
position: 0 },
{ token: 'chees',
start_offset: 6,
end_offset: 12,
type: '<ALPHANUM>',
position: 1 },
{ token: 'pizza',
start_offset: 13,
end_offset: 18,
type: '<ALPHANUM>',
position: 2 } ] }
terms: 3``````

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.