Inconsistent first result when changing `size` in script sorted search

I am trying to use a painless script sort and am experiencing different results when I change the size value of the query, I would expect that since sort happens before paging that the same query with a different size would always produce the same sort values.

In the context of an ecommerce marketplace the script sort is used to sort the result into chunks of products made my different sellers, then sort inside those chunks by "reputationScore".
My sort looks like:

sort: [{
  _script: {
    script: {
      source: `
        def sellerId = doc['sellerId'].value.toString();
        def val = params.sellerMap[sellerId];
        if (val == null) {
          params.sellerMap[sellerId] = 0;
          return 0;
        } else {
          params.sellerMap[sellerId] = val + 1;
          return val + 1;
        }`,
      params: {
        sellerMap: {},
      },
    },
    type: 'number',
    order: 'asc',
  },
}, {
  reputationScore: 'desc',
}]

The full query is not much more complicated:

GET local_variant_search/_search
{
  "sort": [
    {
      "_script": {
        "script": {
          "source": "\n                def sellerId = doc['sellerId'].value.toString();\n                def val = params.sellerMap[sellerId];\n                if (val == null) {\n                  params.sellerMap[sellerId] = 0;\n                  return 0;\n                } else {\n                  params.sellerMap[sellerId] = val + 1;\n                  return val + 1;\n                }",
          "params": {
            "sellerMap": {}
          }
        },
        "type": "number",
        "order": "asc"
      }
    },
    {
      "reputationScore": {
        "order": "desc"
      }
    }
  ],
  "size": 1000,
  "from": 0
}

Using "size": 1, "size": 10, and "size": 1000 all produce different first results on my data set, notably results often have the script result in a score of 1 without another product with the same sellerId being assigned 0, which stops happening when size is the same as the result set.

Is there any insight about how these script sorts are executed that can help me understand this behavior?

Hi @_Matthew_Pavlinsky,
Using params to pass values between invocations of the script is not supported. params should be treated as read-only. Sort operates on a per-document basis.

In order to support passing values between invocations of the script during sorting, Elasticsearch would have to move a lot of data between nodes and shards.

Our current implementation resets params between segments, which may be what you're running into, but that behavior is not guaranteed.

Thanks for the quick and informative reply @stu !

All that sounds reasonable, I'll start looking for different approaches to rewrite the script sort. Basically I am trying to make an Elasticsearch sort that behaves like an SQL window function:

ORDER BY ROW_NUMBER() OVER (PARTITION BY "sellerId")

I'll do some more research.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.