Excuse me in advance for the long post. Up until now, I have avoided
rebuilding from source but now is the time to finally dive in.
Trying to determine reindexing best practices. Using aliases appears to be
the suggested solution to creating updated indices in parallel to searching
an existing index. Searchers always use the same index, while the indexers
create new indexes whenever a reindex is needed, which is then aliased to.
While the new index is being created, do you stop indexing on the existing
index? Scan acts like a cursor, so new inserts after the scan was created
do not get picked up.
Does anybody use a river and reindex data? A river is associated with an
index at creation. If the above recipe holds true, I need to delete the
river and then create a new one with the updated index name. Not the best
scenario since I will lose updates, but the data is not mission critical.
On the topic of reindexing, I also need to recover data from an incorrect
node. I switched a single-node elasticsearch server from using monit to
using the service wrapper. On the same day that I made the migration,
something happened on the box (low memory?) that caused the wrapper to think
that ES was not running and started another instance. The default settings
were in place (1 replica, 5 shards), so the new instance took over some of
the shards. Upon noticing the immense slowdown on the machine due to two
instances running, I killed both process and restarted ES. Of course, the
data from the shard moved to the new incorrect instance is missing. I
assume that the shards on the original instance were rebalanced, so I cannot
simply do a file copy.
My plan for data recover is to move the second node data dir to a new
location, start a new non-clustered ES server and query from it. Two node
clients: one for each server (node). Is it possible to restart two nodes and
gracefully tell one of them to accept all merge data? Probably not.