Is there a way to determine if a reindex task already completed?

I have two clients/apps using an elasticsearch instance and I want them to perform an async reindex task on an index in coordinated fashion so that they don't redo each others work (i.e., avoid both clients triggering a reindexing task at the same time or after a reindexing had completed). While I could coordinate between the clients, I'm trying to brainstorm ways without explicit coordination/synchronization to make it so both of them can detect that a reindex operation had already completed so that node A can perform the reindex operation and node B does nothing because it can detect that a reindexing is in progress by someone else or has already completed. Naively, the client/app that does nothing can simply check the Tasks API to see if the reindexing is in progress - that works no problem...

However, it's possible that node A finishes reindexing before node B checks the Tasks API, in which case there simply won't be a reindexing task because completed Tasks are not returned by the Tasks API. Then how will node B know that the reindexing operation had already occurred? If reindexing is not in progress and had not already occurred then in this situation the roles are swapped and node B ought to be the one to trigger reindexing and node A should wait.

Note: I am using the reindexing api (Reindex documents | Elasticsearch API documentation) with wait_for_completion=false so a task is created and its task id is given back to the caller.

Is it possible to do what I'm doing without explicit coordination between the clients/apps?

Hello!

One way to approach this is to create a distributed lock with a third index to monitor the reindexing status.

Create a lock status index, e.g., reindex-lock:

PUT reindex-lock
{
  "mappings": {
    "properties": {
      "source_index": { "type": "keyword" },
      "dest_index": { "type": "keyword" },
      "status": { "type": "keyword" },  // "in_progress", "completed"
      "timestamp": { "type": "date" }
    }
  }
}

When a client wants to reindex, query reindex-lock for an existing doc matching the source/destination index. If status == "in_progress", do nothing. Otherwise, insert a doc with status = "in_progress", using op_type=create to ensure only one client wins:

    PUT reindex-lock/_doc/reindex-from-foo-to-bar?op_type=create
    {
      "source_index": "foo",
      "dest_index": "bar",
      "status": "in_progress",
      "timestamp": "..."
    }

If this PUT fails with a 409, another client already started it. If it succeeds, continue to reindex and later update status to "completed".

This is basically a distributed lock using Elasticsearch's document versioning and op_type=create.

Because I do this reindexing every time I upgrade Elasticsearch major version so that the index stays within 2 major versions of the server, this index will need to get reindexed too and and maybe there's a bootstrapping issue - to reindex this index with multiple clients, it also needs a lock from a different index reindex-lock-2...and so on....

This seems like a correct approach but in terms of implementation/maintenance I'll have to:

  1. create the index
  2. for every ES major version upgrade we'll have to reindex it
  3. deal with failure conditions like either ensuring that status becomes 'completed' or that we delete the lock; or what if the client/app creates the doc but crashes before triggering the reindex; or the opposite where the reindex is triggered but the client/app crashes before creating the doc
  4. ...

Wondering if there's a simpler solution...