Is this a good use-case for native scripting?

Hi,

We use Elasticsearch to store our database of product information. Recently we added the ability to filter and sort by stock availability. I opted to not store this stock information directly in the document, as it would really bloat document size (a product can have hundreds of locations in which it is available, while our product info itself is quite small) and changes in stock would need to be synced, requiring lots of re-indexing. Instead, what I did was implement a custom plugin that periodically pulls the stock data from our API (packed as 12 bytes per record: product ID, location ID and quantity) and stores this in a LongIntHashMap in memory. Then, filtering and sorting is done by a custom ScriptEngine that exposes two scripts: a FilterScript for filtering and a SearchScript for sorting. Both of them do a lookup in the stock hashmap by accessing the doc values. An example of the FilterScript:

    private class Script(private val orgIDs: LongArray,
                             private val stock: LongIntHashMap, 
                             params: Map<String, Any>,
                             lookup: SearchLookup, 
                             leafContext: LeafReaderContext)
            : FilterScript(params, lookup, leafContext) {

            override fun execute(): Boolean {

                val productID = (doc["product_id"] as ScriptDocValues.Longs)[0]

                for (orgID in orgIDs) {

                    // the key is a 64-bit composite of two 32-bit values
                    val key = productID shl 32 or (orgID and 0xffffffffL)

                    if (stock.containsKey(key)) {
                        return true
                    }
                }

                return false
            }
        }

The script can then be executed by writing your query like this:

post /_search
{
    "query": {
        "bool": {
            "filter": [
                
               {
                   "script" : {
                       "script" : {
                           "lang" : "stock",
                           "source" : "filter_stock",
                           "params" : {
                               "organization_unit_ids" : [1, 2, 3]
                           }
                       }
                   }
               }
            ]
        }
    }
}

This approach seems to work well, but does this make you recoil in horror, or is this a sensible method? It feels a bit over-engineered, but compared to keeping the stock in sync on the documents it's actually less complicated in practice. Not sure how it would compare performance-wise, the way I access the doc values inside a script is perhaps expensive? But the alternative would be storing dozens to hundreds of nested documents per product document, and accessing those in a function score query when I want to use stock for scoring.

Sorry if I don't exactly have a concrete question here, I'm just looking for some general input here, any is appreciated :slight_smile:

It does make me cringe a bit, but it does look like semi-reasonable example of native scripts. It's not that different than eg geoip plugin having it's side database of location to ip. The background syncing gave me pause, but it is much better than doing synchronous calls on every script execution (which I have seen many users try to do), as long as proper care is taken to not block any execution on a concurrent update.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.