Count how many terms match document


(Thomas Decaux) #1

My document have an array of string:

"categories" : ["A", "B", "C"]

I query like this:

POST /topic/_search?explain=true
{
   "query": {
            "terms": {
               "categories": [
                  "yoga",
                  "meditation"
               ]
            }
   }
}

It works fine but the score is computed via TF/IDF (1,4 * 1,9 ....), I would like to have a score that indicate how many categories are matching (1 / 2 / 3 etc..).

Tried constant_score, but not working with terms.


(Simon Willnauer) #2

did you try constant score and instead of using terms use one optional boolean clause per category with a term (note singular) query? This should work just fine I guess.

something like this:

POST /topic/_search?explain=true
{
   "query": {
            "bool": {
                "should" : [
                  {"constant_score" : { "filter" : { "term" : { "categories" : "yoga" } } } } ,
                  {"constant_score" : { "filter" : { "term" : { "categories" : "meditation" } } } }
               ]
            }
   }
}


(Thomas Decaux) #3

Yeah, this is working, but I prefer rely on Lucene tokenizer, so I could get the tokens via ES API and do that, this is a good approach if there is no another way.

I know this is possible to change the similarity algo. via ES settings, is not possible to change / tune TF/IDF algo also ?


(Simon Willnauer) #4

I am not sure what you mean, you used terms in your example.. are you refrerring to lowercaseing etc? You can use match query instead in the same example and you should be fine?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.