Compile rate limit for painless scripts

Hi team,

For one of our update use cases, we were benchmarking painless scripts to check to what extent we can scale them. In our experiments, I had 200 scripts, each with a '\n' character difference, and each script was roughly of 2KB-5KB in size.

  1. Initially we used the scripts on the fly embedded with the API with following syntax:
POST _bulk
{"update": {"_id": "0000009035", "_index": "t1_item", "retry_on_conflict": 3}}
{"scripted_upsert": "true", "script": {"lang": "painless", "source": "\n ...<snip>.. \n", "params": {}}}
{"update": {"_id": "0000009036", "_index": "t2_item", "retry_on_conflict": 3}}
{"scripted_upsert": "true", "script": {"lang": "painless", "source": "\n\n ...<snip>.. \n", "params": {}}}

I saw that the request failed immediately due to this condition being triggered:

GET _nodes/stats?filter_path=nodes.*.script | jq '.nodes[].script' | grep comp
  "compilations": 917,
  "compilations": 5434,
  "compilations": 344,
  "compilations": 944,
  "compilations": 210,

above counters were indicating the values before and after; difference in the total counter roughly came close to 75 when error occurred.

  1. We resorted to storing these 200 scripts with an assumption that, these scripts are pre-compiled and whenever we invoke them, we only see the cache evictions and script loading coming into effect.
POST _bulk
{"update": {"_id": "0000000123", "_index": "t2_item", "retry_on_conflict": 3}}
{"scripted_upsert": "true",  "script": {"id": "item-update-script-106", "params": {""}}}
{"update": {"_id": "0000007876", "_index": "t3_item", "retry_on_conflict": 3}}
{"scripted_upsert": "true",  "script": {"id": "item-update-script-32", "params": {""}}}

However, we still see the bulk update getting rejected with increments in these compilation values nearing to 75. Please throw some more light here why our assumption was incorrect. Does this mean there is no difference between stored scripts and scripts embedded in API apart from syntactical diff ?

  1. I am curious what is the consequence of keeping a high value for update context - say, script.context.$CONTEXT.max_compilations_rate = 1000/1s ?
  2. From the same document, For ingest contexts, the default script compilation rate is unlimited. If we update our docs via Ingest processor scripts, will there be a perf impact with 1000 updates every second? The link below seems to suggest the re-compilation happens for every doc parsed by ingest pipeline. So why is this limit taken off for ingest context alone ? Vice versa, why the limit was applied for other contexts ?
    Script processor | Elasticsearch Reference [master] | Elastic

Also came across this thread:

In my measurement where I bumped up the max_compilations_rate manually via persistent cluster settings, I noticed significant degradation in update latency. In my test, I am running 10000 updates/minute and all 10000 push unique painless script.

Also, I am using elasticsearch 7.5 and the I was not able to change cache_size or cache_expire dynamically. I was able to change only max_compilations_rate setting. These attributes are context based in later releases of elasticsearch, but we are not planning to move to later versions of easlticsearch at the moment.

We are concluding that the cache eviction/reload comes with a compilation overhead even for stored scripts. One logical explanation I could think of is - if the stored script was not cached, and was updated in the backend, then they have to be recompiled before loading them into cache again. That's the only explanation I could think of for recompiling stored script every time when we reload them into cache.

Compilation overhead results into much slower update, impacting update latency by an order of magnitude.

The script cache in 7.5 is shared across contexts. The same cache is used for stored scripts and for scripts submitted with requests.

Compilation overhead results into much slower update, impacting update latency by an order of magnitude.

We recommend sizing your cache appropriate for the number of scripts you'll be using.

Increasing max_compilation_rate will lead to performance degradation if you are using an undersized cache, as you're thrashing your cache by needlessly recompiling scripts.

Thanks @stu . Quick follow-up on this. I tried dynamically setting the cache in 7.9, it threw an exception saying that the cache_limit is not dynamically settable. Could you confirm its true ? Or provide me a dynamically settable option for cache_limit default context ?

cache_max_size is dynamically settable for each context if you switch script.max_compilations_rate to use-context.

It is not dynamic for the if you're using the default context, that would require a change to your elasticsearch.yml.

A low-downtime approach would be to change it there and then perform a rolling restart of your nodes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.