Custom score on the basis of a field value(colon separated) and boosting factor in ES 6.2.4

custom score on the basis of a field value(colon separated) and booting factor

e.g I have a indexed a field as
scorevector : "Lucene:1.3 Hadoop:4.3 Elastic:6.8 Lucene:7.2"

I would like to search the index as

my_index/_search
get_custom_score : {
scorevector : "Lucene^5 Hadoop^2 Elastic^3 HDFS^1.5"
}

score must be returned as [(1.3+7.2)/2]*5 + 4.3*2 + 6.8*3 = 50.25
term "Lucene" payload average [(1.3+7.2)/2]

I tried few things as below

POST my_index
{
"settings": {
"analysis": {
"analyzer": {
"payloads": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"my_delimited_payload"
]
}
},
"filter": {
"my_delimited_payload": {
"type": "delimited_payload",
"delimiter": ":",
"encoding": "float"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"scorevector": {
"norms": true,
"index": true,
"store": false,
"type": "text",
"analyzer": "payloads"
}
}
}
}
}

PUT my_index
{scorevector : "Lucene:1.3 Hadoop:4.3 Elastic:6.8 HDFS:7.2"}

Now as per my understanding this payload (float value next to colon) should be analyzed.

I think I can make use of AveragePayloadFunction.

I need some solution to get the score as needed.

Could you please help me to with some example code?

Any help will be appreacciated.

Thanks in advance
Amit

I could resolve it by making two plugins as below

  • MySimilarityPlugin.java : extends Plugin
    @Override
    public void onIndexModule(IndexModule im) {
        im.addSimilarity(..., new MySimilarity(...))
    }
    MySimilarity extends ClassicSimilarity and all method returns as 1.

  • MyQueryParserPlugin.java : extends Plugin implements SearchPlugin

  1. Override getQueries(..MyQueryBuilder...)
  2. MyQueryBuilder is same as QueryStringQueryBuilder use MyQueryParser instead QueryStringQueryParser.
  3. MyQueryParser extends QueryStringQueryParser
    @Override
    public Query getFieldQuery(String f, String qt, boolean q) {
        return new PayloadScoreQuery(new SpanTermQuery(new Term(field, queryText)), new AveragePayloadFunction());
    }
  • 'test' index as below
    curl -XPUT 'localhost:9200/test' -d ' {
    "settings": {
    "index": {
    "similarity": {
    "mysim": {
    "type": "mysimilarity"
    }
    },
    "analysis": {
    "analyzer": {
    "myanalyzer": {
    "filter": [
    "mytokenfilter"
    ],
    "type": "custom",
    "tokenizer": "whitespace"
    }
    },
    "filter": {
    "mytokenfilter": {
    "type": "delimited_payload",
    "delimiter": ":"
    }
    }
    }
    }
    },
    "mapping": {
    "docs": {
    "properties": {
    "mytext": {
    "type": "text",
    "similarity": "mysim",
    "analyzer": "myanalyzer"
    }
    }
    }
    }
    }'

Its working as expected. :grinning:

Please let me know, if you have any better approach.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.