Boost per search time token

karejonsson · December 5, 2017, 8:48pm

From experience I want sensitivity on subwords. By adding an extensive wordlist I find the subwords but I get undesired behavior. If the subwords are frequent or appear in two fields they are higher scored than the full word. One solution would be to boost on the tokens length, probably by the square of the length. Example

I have Swedish texts. The word "fastland" (means mainland) gets analysed/tokenized to "fastland", "fast", "land". A text with the word "Finland" appearing 5 times gets higher score than a text with the word "fastland" appearing once.

What Java-classes shall I subclass to make a search time boost on an individual token?
Is a bit of scripting in Painless the right way to go, i.e. does ctx or some parameter contain the hitting tokens?

karejonsson · December 11, 2017, 8:35am

I notice very little interest in this so I'll publish my solution, make some comments and then mark it solved.

The solution was to have different analyzers when indexing and searching. The important difference was to have the dictionary decompounder only while indexing. It is well documented how to do it here.

I made some attempts with the Painless language. My points of view are the following.

No enough available variables in the function score callback. At least the variables appearing in the Explanation should be available. I would also have wanted the hitting tokens.
The Debug class offers to little. One method to get an exception thrown is a negatively surprising choice to me.
The Groovy language should be available without plugging it in.

system · January 8, 2018, 8:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Boost field A based on range field B Elasticsearch	6	430	November 19, 2018
How to boost scoring for whole word hits over substring hits Elasticsearch	1	169	October 27, 2023
Boosting results at query time using java api Elasticsearch	2	330	July 6, 2017
Scoring and boost Elasticsearch	6	2183	July 5, 2017
Document Scoring Elasticsearch painless , ingest-pipeline	1	317	November 1, 2021

Boost per search time token

Related topics