Hi team,
For one of our update use cases, we were benchmarking painless scripts to check to what extent we can scale them. In our experiments, I had 200 scripts, each with a '\n' character difference, and each script was roughly of 2KB-5KB in size.
- Initially we used the scripts on the fly embedded with the API with following syntax:
POST _bulk
{"update": {"_id": "0000009035", "_index": "t1_item", "retry_on_conflict": 3}}
{"scripted_upsert": "true", "script": {"lang": "painless", "source": "\n ...<snip>.. \n", "params": {}}}
{"update": {"_id": "0000009036", "_index": "t2_item", "retry_on_conflict": 3}}
{"scripted_upsert": "true", "script": {"lang": "painless", "source": "\n\n ...<snip>.. \n", "params": {}}}
I saw that the request failed immediately due to this condition being triggered:
GET _nodes/stats?filter_path=nodes.*.script | jq '.nodes[].script' | grep comp
"compilations": 917,
"compilations": 5434,
"compilations": 344,
"compilations": 944,
"compilations": 210,
above counters were indicating the values before and after; difference in the total counter roughly came close to 75 when error occurred.
- We resorted to storing these 200 scripts with an assumption that, these scripts are pre-compiled and whenever we invoke them, we only see the cache evictions and script loading coming into effect.
POST _bulk
{"update": {"_id": "0000000123", "_index": "t2_item", "retry_on_conflict": 3}}
{"scripted_upsert": "true", "script": {"id": "item-update-script-106", "params": {""}}}
{"update": {"_id": "0000007876", "_index": "t3_item", "retry_on_conflict": 3}}
{"scripted_upsert": "true", "script": {"id": "item-update-script-32", "params": {""}}}
However, we still see the bulk update getting rejected with increments in these compilation values nearing to 75. Please throw some more light here why our assumption was incorrect. Does this mean there is no difference between stored scripts and scripts embedded in API apart from syntactical diff ?
- I am curious what is the consequence of keeping a high value for update context - say, script.context.$CONTEXT.max_compilations_rate = 1000/1s ?
- From the same document,
For ingest contexts, the default script compilation rate is unlimited.
If we update our docs via Ingest processor scripts, will there be a perf impact with 1000 updates every second? The link below seems to suggest the re-compilation happens for every doc parsed by ingest pipeline. So why is this limit taken off for ingest context alone ? Vice versa, why the limit was applied for other contexts ?
Script processor | Elasticsearch Reference [master] | Elastic