Score influenced by field value (terms_set query)

sserena · May 21, 2019, 3:40pm

Hello

I'm new to elasticsearch. I really like it, but couldn't find a solution to one specific use case so far (been looking for quite a while):

Part of my mapping:

entities: {
  properties: {
    term: {
      type: 'keyword'
    },
    type: {
      type: 'keyword'
    },
    salience: {
      type: 'float'
    }
  }
}

Data looks like this:

{
    "term": "class",
    "type": "OTHER",
    "salience": 0.30540481209754944
},
{
    "term": "reproach",
    "type": "OTHER",
    "salience": 0.1406273990869522
},
{
    "term": "work",
    "type": "OTHER",
    "salience": 0.1406273990869522
}

I'm using a terms_set query to find the best matches based on a random number of terms. This works well and the score is reasonable.

However, I would like to calculate a new score that takes the "salience" field into account. So, if my query is only for "class", I'd like to get the document where this field has a higher "salience" first. The formula could be something like "_score + sum([salience of matches]) * [some_factor]", for example. My best idea so far was by means of the painless score context (and then two loops to compare the input terms with the available "entities"... and some hope it won't be too slow), but being that I couldn't find a way to access the original input terms anyway, I didn't get anywhere with this.

Is this possible? If so, how should I approach it? If required, changing my structure wouldn't be a problem.

For completeness' sake, here's my current query:

GET library/_search
{
	"query": {
		"function_score": {
			"query": {
				"terms_set": {
					"entities.term": {
						"terms": ["work", "voice", "errors", "impressions"],
						"minimum_should_match_script": {
							"source": "1"
						}
					}
				}
			},
			"script_score": {
				"script": {
					"source": "return _score"
				}
			}
		}
	},
	"size": 100
}

I'm using version 7.0.1.

Thank you!

mayya · June 4, 2019, 10:58am

Finding matches and calculating scores based on them in painless sounds inefficient.
If you don't have many terms you can take advantage of should clauses of bool query and constant score where boost will be your salience field, something like this:

{
    "query": {
        "bool": {
            "should": [
                {
                    "constant_score": {
                        "filter": {
                            "term": {
                                "term": "class"
                            }
                        },
                        "boost" : 0.30540481209754944
                    }
                },
                {
                    "constant_score": {
                        "filter": {
                            "term": {
                                "codes": "reproach"
                            }
                        },
                        "boost" : 0.1406273990869522
                    }
                }
            ]
        }
    }
}

system · July 2, 2019, 10:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Field value factor based on matched term Elasticsearch	1	327	December 29, 2020
Multi value field different scoring for terms and terms_set queries Elasticsearch	1	253	August 5, 2021
Filtering using terms for function_score in elasticsearch? Elasticsearch	2	650	July 2, 2020
Newbie elasticssearch questions Elasticsearch	5	377	July 6, 2017
Scoring per term match Elasticsearch	1	559	July 5, 2017

Score influenced by field value (terms_set query)

Related topics