Using snapshot&restore to separate indexing from searching

anon44344346 · April 30, 2014, 3:03pm

as I posted before, our system does not fit very well in cluster structure,
because we have many small indices in place (about 1k indices with an
average of 6k records each), we guessed that with so many small indices,
the cluster spent too much time and resources which nodes should be master
, or where to locate absurdly small shards, etc... Bottom line is that the
cluster always ended up not working right. BTW, I'm suspecting that with a
few advanced tuning options of the cluster (shard routing and the like) we
may be able to put it on again, but unfortunately we can't find that kind
of knowledge in the standard doc. If any of you have any hint on this, it
would be greatly appreciated!!!

Anyway, we need to scale the system somehow, and this is what we've come up
with:

Our indices can have configuration variations that make a reindex
needed at any time. it doesn't happen a lot, but it happens, and with 1k
indices, it's bound to happen.
Indexing data is regenerated everyday, so every day the whole set of
indices is re-created (we figured it's much faster to "recreate" the index
than to update an existing one replacing everyone of its records)

We would like the machines used for searching results are only used for
that, and never used for indexing/reindexing ops, because we don't want the
user experience to suffer when searching against an already loaded server
because it's doing some heavy indexing.

In our ideal scenario, indexing/reindexing would be done in devoted
machines, which can be as many as needed, and searching would be done in
different machines. We plan to use the snapshot/restore feature for that.

Any time an index/reindex is needed, it would be done on one of these
"indexing machines", and then the fresh index would be snapshotted, to be
restored to the search machine afterwards. We should have some client
control to make sure the "snapshot" process is only once at a time, it's my
understanding that this is not the case in the restore process (i.e. you
can have more than one restore process running on a cluster).

Individual item index can happen occasionally, but I figure when that
happens we can just index to both the searching machines and the indexing
machines, because it's never going to be big.

Please understand "cluster" instead of "machine"

How crazy does this whole thing sound, Is there any other way we can get
some scalability?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82d7dd51-1b86-4b0f-8abc-425a45f1dfac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Reindexing indices between clusters Elasticsearch	32	2780	January 8, 2018
Snapshot & Restore : 6.2 version Elasticsearch	11	1279	March 21, 2019
Restoring to a different cluster Elasticsearch snapshot-and-restore	7	230	July 4, 2022
Around 1500 indexes to snapshot and restore in another cluster. How would you do it? Elasticsearch snapshot-and-restore	9	189	March 21, 2024
Problem with Snapshot-Restore in Two-Node-Cluster with Shards Elasticsearch	1	340	July 6, 2017

Using snapshot&restore to separate indexing from searching

Related topics