Dear Community,
I'm trying to add a new field to all documents that match a certain condition.
After doing some research I found that the update_by_query API might be a suitable method to achieve this.
What I have so far is the following:
POST packets-test/_update_by_query
{
"script": {
"source": "ctx._source.status = 1",
"lang": "painless"
},
"query": {
"match": {
"layers.ip.ip_ip_src": "10.7.6.2"
}
}
}
In other words I want to add a "status" field with value 1 to all documents where the field "layers.ip.ip_ip_src" is "10.7.6.2" (type: ip).
However, running this, I get the following error:
{
"statusCode": 504,
"error": "Gateway Time-out",
"message": "Client request timeout"
}
Am I doing something wrong?
My (test) index has 1.5mio documents and 484,003 of those match the above query condition.
However the real index where I would like to apply this to holds about 200mio documents.
Does update_by_query even scale to such dimensions? If not what would be a possible solution to do this in a reasonable amount of time?
(I use Elasticsearch 6.2)
EDIT:
Ich habe nun noch einen Versuch mit einer "match_all" query gemacht (also um das Feld in allen Dokumenten zu updaten): So läuft es durch ohne timeout error, wenn auch ziemlich langsam.
- warum läuft es nicht mit einer spezifischeren query?
- wie kann man das schneller machen/parallelisieren (e.g. mit der Python API?)