Add new field to index using update_by_query

Dear Community,

I'm trying to add a new field to all documents that match a certain condition.
After doing some research I found that the update_by_query API might be a suitable method to achieve this.
What I have so far is the following:

POST packets-test/_update_by_query
{
"script": {
"source": "ctx._source.status = 1",
"lang": "painless"
},
"query": {
"match": {
"layers.ip.ip_ip_src": "10.7.6.2"
}
}
}

In other words I want to add a "status" field with value 1 to all documents where the field "layers.ip.ip_ip_src" is "10.7.6.2" (type: ip).
However, running this, I get the following error:

{
"statusCode": 504,
"error": "Gateway Time-out",
"message": "Client request timeout"
}

Am I doing something wrong?
My (test) index has 1.5mio documents and 484,003 of those match the above query condition.
However the real index where I would like to apply this to holds about 200mio documents.

Does update_by_query even scale to such dimensions? If not what would be a possible solution to do this in a reasonable amount of time?
(I use Elasticsearch 6.2)

EDIT:
Ich habe nun noch einen Versuch mit einer "match_all" query gemacht (also um das Feld in allen Dokumenten zu updaten): So läuft es durch ohne timeout error, wenn auch ziemlich langsam.

  1. warum läuft es nicht mit einer spezifischeren query?
  2. wie kann man das schneller machen/parallelisieren (e.g. mit der Python API?)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.