Cannot get term payloads to work


(wojtek) #1

Hi,

I've spent several days trying to achieve something like this in ES:

http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/Payloads

But had no luck.

This is what I did (maybe all of that in unnecessary, but I couldn't find
any information on how to do this correctly):

  • Created a custom similarity and a provider (mimicking
    https://github.com/tlrx/elasticsearch-custom-similarity-provider )

    @Override
    public float scorePayload(int docId, java.lang.String fieldName, int
    start, int end, byte[] payload, int offset, int length) {

      if (payload != null) {
          return PayloadHelper.decodeFloat(payload, offset);
      } else {
          return 1.0F;
      }
    

    }

  • Created a custom analyzer:

              "analyzer": { 
                  "pf": { 
                      "tokenizer": "whitespace", 
                      "filter": "standard, lowercase, stop, payload" 
                  } 
              } 
    

where the payload filter just does:

@Override
public TokenStream create(TokenStream tokenStream) {
    return new DelimitedPayloadTokenFilter(tokenStream, '|', new 

FloatEncoder());
}

  • created an index that uses that similarity for index and search and that
    analyzer.
  • Created a mapping that instructs to use the new analyzer when indexing
    field "message"
  • indexed a document:

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"tweet" : {
"user" : "kimchy",
"message" : "search|40.0"
}
}'

Did a search for term "search", but looking at the query "explain" the
payload doesn't seem to be taken into account.

Also when I retrieve the document back, I see the content of "message" to
be exactly like I submitted to indexing (including the pipe and payload)
not sure this is a bad sign.

Please let me know what I'm doing wrong here.


(wojtek) #2

I made some progress. Turns out I needed to set the pf analyzer on the _all
field, so that in combination with AllTermsQuery it works properly.

What should I do if I want this to work on searches restricted to the
"message" field?


(Shay Banon) #3

You can simply set the analyzer on the mentioned field and explicitly
search for it. The _all field is a special field that includes all the
other fields.

On Sat, Jun 9, 2012 at 2:42 AM, wojtek wojciech.jawor@gmail.com wrote:

I made some progress. Turns out I needed to set the pf analyzer on the
_all field, so that in combination with AllTermsQuery it works properly.

What should I do if I want this to work on searches restricted to the
"message" field?


(system) #4