Weighted Probability Sort Based on Percentage Of Total

Hi everyone,

I have an application that is used as a listing for course providers/companies, with a geolocation aspect to it. In short, providers can place bids on pre-defined areas of the website, to compete against other providers within the same area. The higher the provider's sum of bids relative to the sum of all bids for the area, the higher the probability of them being displayed as a featured/promoted course provider on that area.

So for the most part I have got this working on ElasticSearch 5.2.2 using .Net and Nest. Course providers are being filtered based on geolocation and the results that come back are correct. Where I get stuck is to calculate the probability percentages, and then to apply a weighted random sort based on those percentages.

I have an index containing all the course providers and their courses. Each course provider object will contain a nested object for the bids related to that specific provider. The mapping for my course provider object is as follows:

{
  "mappings": {
	 "providerindexentity": {
		"properties": {
		   "companyName": {
			  "type": "text",
			  "fields": {
				 "keyword": {
					"type": "keyword",
					"ignore_above": 256
				 }
			  }
		   },
		   "id": {
			  "type": "integer"
		   },
		   "providerBids": {
			  "type": "nested",
			  "properties": {
				 "area": {
					"type": "integer"
				 },
				 "bid": {
					"type": "double"
				 },
				 "cityId": {
					"type": "integer"
				 },
				 "cityName": {
					"type": "text",
					"fields": {
					   "keyword": {
						  "type": "keyword",
						  "ignore_above": 256
					   }
					}
				 }
			  }
		   }
		}
	 }
  }
}

I started off with just testing this on an aggregation and got my aggregation to return the correct values per provider (sum of each provider's bids using a Terms aggregation on the provider id) and the total of all provider bids (sum of all bid values). So this gave me 2 separate aggregations with the correct values in it, now I just needed to bring them together and calculate the percentage of each provider relative to the total. For this I used a bucket script aggregation to calculate the percentage. The query executed as follows:

{
  "query" : {
    "bool" : {
      "must" : [
        {
          "nested" : {
            "query" : {
              "terms" : {
                "providerBids.area" : [
                  2
                ],
                "boost" : 1.0
              }
            },
            "path" : "providerBids",
            "ignore_unmapped" : false,
            "score_mode" : "avg",
            "boost" : 1.0
          }
        },
        {
          "geo_distance" : {
            "providerLocation" : [
              103.8,
              1.3667
            ],
            "distance" : 250000.0,
            "distance_type" : "sloppy_arc",
            "validation_method" : "STRICT",
            "ignore_unmapped" : false,
            "boost" : 1.0
          }
        }
      ],
      "disable_coord" : false,
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "script_fields" : { },
  "sort" : [ ],
  "aggregations" : {
    "bids" : {
      "nested" : {
        "path" : "providerBids"
      },
      "aggregations" : {
        "bid_totals" : {
          "sum" : {
            "field" : "providerBids.bid"
          }
        }
      }
    },
    "providers" : {
      "terms" : {
        "field" : "id",
        "size" : 10,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_count" : "desc"
          },
          {
            "_term" : "asc"
          }
        ]
      },
      "aggregations" : {
        "provider_bids" : {
          "nested" : {
            "path" : "providerBids"
          },
          "aggregations" : {
            "provider_bid_totals" : {
              "sum" : {
                "field" : "providerBids.bid"
              }
            }
          }
        },
        "provider_bid_percentage" : {
          "bucket_script" : {
            "buckets_path" : {
              "provider_bid_totals" : "providers>provider_bids>provider_bid_totals",
              "bid_totals" : "bids>bid_totals"
            },
            "script" : {
              "inline" : "(params.provider_bid_totals / params.bid_totals) * 100",
              "lang" : "painless"
            },
            "gap_policy" : "skip"
          }
        }
      }
    }
  }
}

When I run this I get an exception suggesting that the bucket script can't find my parent level aggregation.

No aggregation found for path [bids>bid_totals]

And this is the point where I have been stuck for a few days now.

So my objective is to use this percentage in my eventual sort so that I can return a set of x provider records, randomly sorted with a probability applied based on their percentage sum of bids relative to the total sum bids.

Am I on the right track to achieve this, or should I maybe rethink my approach? I am fairly new to ElasticSearch and would appreciate some experienced input to reach my eventual goal.

Thanks in advance

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.