Just wanted to let everyone know that we (Aconex) have open-sourced an ASL2
licensed utility to look for any inconsistencies between the information
stored in ElasticSearch and a JDBC database (with extension points for
You can find the project here: https://github.com/Aconex/scrutineer
First off: This tool was NOT developed because ElasticSearch is buggy,
NOOOO, this is because can happen. If you rely on the data
stored in elasticsearch being accurate (that is, your client application is
sending it the right info) than Scrutineer can help. For those
applications where Near-Real-Time (NRT) indexing is part of the use case,
then Scrutineer is very useful.
Scrutineer compares the Version property stored in your ElasticSearch
record with a matching one from your source-of-truth (say DB) and reports
inconsistent state and missing records, relying on you indexing this
Version property using the VersionType.EXTERNAL flag.
Scrutineer can be used in many cases where a full reindex would be very
costly, such as:
Detecting and reindexing 1-50% of your index may be quicker than
performing a full reindex. Once the error rate approaches >50% it may just
be quicker to reindex though. Your mileage may vary.
Reindex preparation - You could prepare a new copy of your index on your
cluster (or a different one) using a snapshot of your production DB (to
minimise adverse load/disruptions to production customers), then use
Scrutineer to quickly find and index what's changed since that DB snapshot
then switch your index alias to the new one.
"Filesystem" scrubber. Run regularly in production to find and hand off
to another tool to fix any errors before it affects your customers.
Disaster Recovery - if you copy a gateway snapshot to a DR location
periodically to keep closely in sync with your DB, Scrutineer can be used
at Disaster time (heaven forbid) to make sure your index is in sync with
your recovered DB.
If you ever had to take your database 'back in time' due to a
catastrophic failure you could recover your elasticsearch indexstate
quickly by 'rollingback' changes; deleting records that are no longer in
the db and freshening stale entries back to their earlier states. This may
be much faster than a full reindex depending on how far back you have to go.
Scrutineer is pretty fast, for reference it took 3.5 seconds to verify 275k
records from a 2-node ES cluster (with vanilla config) against a database
There's a tarball download here:
If you have any questions, please let us know.