Push common search results down ('common' being subjective)

We've been trying to improve our search results, which initially started as a simple match_all, + optional term matches depending on user action, and a sort by geo.

I've switched it to use scoring instead, and use a scripted function_score that effectively puts each doc into a 'distance group' (integer-scores) and then adds a random_score to randomize the results within each group. Just for completeness, this is the general idea:

{
  "query": {
    "bool": {
      "must": [
        {
          "function_score": {
            "query": {
              "match_all": {}
            },
            "functions": [
              {
                "random_score": {
                  "seed": 10,
                  "field": "_seq_no"
                },
                "weight": 1
              },
              {
                "script_score": {
                  "script": {
                    "source": "double d; if(doc['location.geo'].value != null) { d = doc['location.geo'].planeDistance(params.lat, params.lng) * 0.000621371; } else { return 0.1; } if(d >= (params.grouping*params.fromScore)) { return 0.2; } else { return params.fromScore - Math.floor(d\/params.grouping); }",
                    "params": {
                      "lat": 27.95,
                      "lng": -82.48,
                      "fromScore": 100,
                      "grouping": 60
                    }
                  }
                }
              }
            ],
            "boost_mode": "sum",
            "score_mode": "sum"
          }
        },
        {
          "match_all": {}
        }
      ]
    }
  },
  "aggregations": {
    // ...
  }
}

If you don't want to read/understand this, all it's doing is, eg.

  • From 0-60 miles away, score 100, then random_score randomizes those results (100-100.999..)
  • From 61-120 miles away, score 99, then random_score randomizes those results (99-99.999...)
  • and so on

However, this still has the problem that sellers with MANY results still dominate the groups. That is, within each group, if a seller has 700 items and another seller only has 10, the seller with 700 results will vastly dominate all the results.

What would be a fast, simple way to prevent this? The best way would be to do something such as, stop scoring them after a particular "sellerId" has been seen N times. However, I can't find any way to use other documents for scoring in scripting.

Any suggestions would be greatly appreciated. Even if I have to perform two queries - one to grab an agg. of top sellerId's and the total counts, and then one to actually query for documents, if this is the only way then it's the only way -- I just still can't think of the best way to handle even this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.