How to perform automatic update_by_query on an index?

Here is an example:

POST /test-scholarship/_doc
{
  "student": "George",
  "grade": 98
}

I want to add a new field (scholarship) to the above doc.

POST /test-scholarship/_update_by_query
{
  "script": "ctx._source.scholarship = 'Yes'",
  "query": {
    "grade": {
      "row": {
        "gte": 95
      }
    }
  }
}

Then

GET /test-scholarship/_search
returns:

"hits" : [
  {
    "_index" : "test-scholarship",
    "_type" : "_doc",
    "_id" : "ORawAm0BHjv4SKuqp9C8",
    "_score" : 1.0,
    "_source" : {
      "student" : "George",
      "grade" : 98,
      "scholarship" : "Yes"
    }
  }
]

Question:

  • How can I automate this update_by_query task? I don't want to run this query every once in a while to update my documents with a new field, I want to run this query automatically on my data every day. Is it possible?

I appreciate any feedback or suggestions.

Bump

You can not automate it but might be able to run the calculation on ingest through an ingest pipeline you associate with the index.

What is the name of the process called? Not quite sure what to search for.

Start here: https://www.elastic.co/guide/en/elasticsearch/reference/current/pipeline.html

Then add processors

1 Like

Solution:

Update by query using Python Elasticsearch:

if self.es.indices.exists(index = i):
    for q in queries:
        try:
            self.es.update_by_query(index = i, body = q)
        except ConflictError:
            self.es.indices.refresh(index = i)
            self.es.update_by_query(index = i, body = q)

Add a cron to run the script on your machine automatically (locally) or put it in docker and run it in ECS task on a cloud service like AWS.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.