Detect data inconsistency with Elasticsearch


#1

Hi !

For a specific application, we generate data (JSON documents) on 3 different servers.
Those data should be consistent accross all 3 servers, meaning that each server should have the same documents, with the same values by ID, etc

However, it happens that sometimes, the application get unsynced and we can find on a server missing documents, or unpushed modifications (For the same field, values are differents on a document specified by its ID on all servers).

How could I use Elasticsearch to determine which documents are either missing on a server, or inconsistent accross all servers ?

The first thought I had was to hash all documents, and index them on ES by their hash. Therefore, I only have to do a research on the _version field to find the documents not on version 3, meaning that they either have a different hash (unsynced) or missing on a server (indexed and updated only once).

Would that work ? Is there another cleverer way do to that, maybe using 3 indices ?

Thanks for your help !

Regards,