Hi! I'm trying to use the UpdateByQuery to update a property of a large amount of documents. I got the new values by processing them with python, and now I need to update their values in the indexes. As each document has a different value, I need to execute the update one by one. I'm traversing a big amount of documents, and for each document I call this funcion:
def update_references(self, query, script_source): try: ubq = UpdateByQuery(using=self.client, index=self.index).update_from_dict(query).script(source=script_source) ubq.execute() except Exception as err: return False return True
Some example values are:
query = {'query': {'match': {'_id': 'VpKI1msBNuDimFsyxxm4'}}}
script_source = 'ctx._source.refs = ['python', 'java']'
The problem is that when I do that, I got an error: "Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting".
If I change the max_compilations_rate using Kibana, it has no effect:
PUT _cluster/settings
{
"transient": {
"script.max_compilations_rate": "1500/1m"
}
}
Anyway, I don't want to depend on the performance ofthe computer, so it would be better to use a parametrized script. I tried:
def update_references(self, query, script_source, script_params):
try: ubq = UpdateByQuery(using=self.client, index=self.index).update_from_dict(query).script(source=script_source, params=script_params) ubq.execute() except Exception as err: return False return True
So, this time:
script_source = 'ctx._source.refs = params.value'
script_params = {'value': ['python', 'java']}
But as I have to update the query and the parameters each time, I need to create a new instance of the UpdateByQuery for each document in the large collection, and the result is the same error.
I also tried to traverse and update the large collection with:
es.update(
index=kwargs["index"],
doc_type="paper",
id=paper["_id"],
body={"doc": {
"refs": paper["refs"] # e.g. [\'python\', \'java\']
}}
)
But I'm getting the following error: "Failed to establish a new connection: [Errno 99] Cannot assign requested address juil. 10 18:07:14 bib gunicorn[20891]: POST http://localhost:9200/papers/paper/OZKI1msBNuDimFsy0SM9/_update [status:N/A request:0.005s"
Then, I tried an update by script to update a large amount of documents at the same time:
q = {
"script": {
"inline": script_code,
"lang": "painless"
},
"query": {"match_all": {}}
}es.update_by_query(body=q, doc_type='paper', index=self.index, params={ "docs": papers })
And this time I got: Error: RequestError(400, 'too_long_frame_exception', 'An HTTP line is larger than 4096 bytes.')
So, please, if you have any idea on how to solve this problem and update my fields will be really appreciated.
Best,