Usual sorting with elasticsearch (boost/sort based on other documents)

The website in question is simply about 100k+ documents from about 300 different "sellers". Each seller has 1 or more location which is stored in the documents to be searched.

Everything works fine except when we try to do geo-searches. All of the "large" sellers are dominating some search results because (a) just by chance, they are slightly closer to the given geopoint. What we are selling, people will happily travel ~100 miles for. So a mile or two doesn't make much of a difference in reality, but it does in elastic, obviously. (b) They have a lot of results.

We have a simple query, that currently does match all + sort by geo. What I would like is that the results end up something like:

take the distance and divide by X, then multiply by X, rounded (so we have 0, 100, 200, .. etc .. the "approximate distance" .. may end up being 10,20,40,.. whatever... you get the idea .. a "an approximate distance")

Sort by or score based on this approximate distance so that the closest are approximately first.

Then, somehow sort so that we get a "round robin" of each "sellerId". So, the results would be something like this:

{"sellerId": 123, "approxDistance": 0 }
{"sellerId": 456, "approxDistance": 0 }
{"sellerId": 789, "approxDistance": 0 }
{"sellerId": 123, "approxDistance": 0 }
{"sellerId": 456, "approxDistance": 0 }
{hundreds more of 123, 456, 789 sorted like this quasi-randomly or whatnot}
{"sellerId": 333, "approxDistance": 20 }
{"sellerId": 444, "approxDistance": 20 }
{"sellerId": 333, "approxDistance": 20 }

Currently, we have something like this:

{"sellerId": 123, "distance": 0.0001 }
{"sellerId": 123, "distance": 0.0001 }
{"sellerId": 123, "distance": 0.0001 }
{"sellerId": 123, "distance": 0.0001 }
{hundreds more results}
{"sellerId": 123, "distance": 0.0001 }
{"sellerId": 456, "distance": 0.0002 }
{"sellerId": 456, "distance": 0.0002 }

...

The solution doesn't have to be perfect, ... I played with changing the "sort" to "scoring" and using the gauss function_score, however, this still doesn't "spread out" enough.

It doesn't need to be perfectly sellerOne, sellerTwo, sellerThree, sellerOne, sellerTwo, ... in this fashion. If we could just get them "scattered" a little bit, it would be perfect.

What is a way I can score, or sort these, so that each group of 10-20 results has a good "mix" of a specific field? (that will also work in addition to being sorted/scored by the "approx. distance")

Is this even possible?

Even if I need to change the applicatino code slightly, and perform two queries, it can be done. Although the more simple the better. It feels like I need to be able to sort/score each document depending on it's proximity to other documents, ... although again, simple is good, and that sounds quite complex if even possible.

Several pointers I can suggest to help you with your problem:

Function Score query has a random_score function that can provide random score [0,1) based on the field value of a document. This could be used with _seq_no field, as this field will be different for each document.

You can combine random_score with your distance score to get a final score.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.