How to index <term, weight> pairs as <term, payload> pairs? And how to use the payload for scoring?

In my application, I have a map of <term, weight> pairs that I would like
to index into the same field, say "f", where the term should be indexed "as
is", ie, un-analyzed, and the weight should be indexed as the payload for
that term. I only need payloads for that field, and not term vectors. Each
document would have a different map of <term, weight> pairs to be indexed
as term and payload into the field "F". I am using ES 1.4, with the Java
API for both indexing and searching.
I had been doing this (indexing and scoring) directly with Lucene thus far.
However, after reading the ElasticSearch documentation, I could not find
any API, or even an approach, for how to achieve payload indexing in
ElasticSearch
I did find some documentation on how to use scripting for extracting and
using the payloads for scoring; this part would work, but I would prefer a
Java API approach, because it would avoid the scripting performance hit.

Any help on this, especially the indexing part, would be appreciated.

-devarajaswami

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e047c379-a23e-490f-a1e6-31d28dfa8c06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi, I happened to have pretty much the same use case.

I used a home made plugin. One to index terms with payload. Actually you can use https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-delimited-payload-tokenfilter.html which does the same thing as mine.

Then I tried to use a custom similarity and overload :

@Override
public float scorePayload(int doc, int start, int end, BytesRef payload) {
  if (payload != null) {
return PayloadHelper.decodeFloat(payload.bytes, payload.offset);
} else {
  return 1.0F;
}
}

Unfortunately I never managed to make the payload taken into account in the scoring function.

I managed to use a script to score the payload. I added a script file in the server config scorePayload.groovy as follow :

score=0; 
for(term in search_terms) {
   termInfo = _index[field_name].get(term,_PAYLOADS | _CACHE); 
   for(pos in termInfo) { 
     score = score + pos.payloadAsFloat(1);
   }
 };
return log(score+2); 

And use the script as follow in my query :

  "query": {
    "function_score": {
      "score_mode": "multiply",
      "max_boost": 10,
      "functions": [
     {
      "script_score": {
          "script": {
                  "lang":"groovy",
                  "file": "scorePayloads",
                "params": { 
                      "field_name": "tags_payload",
                     "search_terms": ["term1" ] }
          }
       }
      }
    ],
    "query": {
        "term": {
           "tags_payload": {
              "value": "term1"
           }
        }
     }
  }

I'm still wondering why the similarity foes not take into account my scorePayload.

By the way, have you managed to handle a custom similarity into the scoring ?

I have an example of term position scoring.
https://github.com/sdauletau/elasticsearch-position-similarity.