Reindexing indices between clusters

Hi @Christian_Dahlqvist,

I have a question about the shard allocation filtering. Let's say we have two zones: indexing and serving. Can we have shards replication so that after indexing we will move only one replica to the serving zone and keep the second replica in the indexing zone? Then, in the next indexing iteration we could use an existing index in the indexing zone because we still have the second replica in the indexing zone and we won't have to start the whole index from scratch.
In addition, is it possible to rename an existing index?

Thanks

You could this through shard allocation awareness, but queries would then hit any shard in the cluster, irrespective of zone. If you index into this, indexing will naturally also be performed on all shards irrespective of zone.

I am not sure I understand what you envision gaining from this.

No, this is not possible. It is however common to use an alias to hide the underlying index and allow it to be quickly switched, e.g. when a new version is available.

I envision gaining from this the ability to update and change an existing index in the indexing zone without having to index it from scratch. I have a bunch of changes stored in files, along with the whole index and I would like to avoid the need to index the whole thing from scratch as I explained. If I still have a version of the index that is ready to serve on one zone but still have the same version replicated also in the indexing zone I can update that index just a little bit base on my small bunch of changes instead of having to do it from scratch.
If the way I thought of doing it isn't in fact possible, do you any other suggestion of solving it?

The reason I suggested an indexing zone was to keep the indexing load away from the serving nodes for the full rebuild.

For small incremental changes to an index being searched, I would update the index directly in the serving zone.

Is there a way to copy/duplicate shard before moving it from indexing zone to serving zone?

Do you mean copy/duplicate the index?

I am not aware of any easy and efficient way to do that.

I'll try it from another direction:
Let's say I have one cluster with 12 nodes.
6 nodes defined in index/worker zone and 6 nodes defined in serving zone.
Each index has 3 shared replicated once (each shard appear twice).
Can I enforce (with the same shared allocation filtering or something else) where the replica is stored?
I want to see shard 1 in node 1, shard 2 in node 2 and shared 3 in node 3.
Then I want to see shard 1 replica in node 4, shard 2 replica in node 5 and shard 3 replica in node 6.
Last thing, if I define the first 3 nodes and the last on a different availability zone, what you can say about the performance?

Thanks again,
Itay

What are you looking to achieve with this arrangement? What is the problem you are looking to solve?

The ability to serve the same indices in two different availability zones

So you want a single cluster with two zones, where the primary is on one side and the replica on the other? This is common when deploying e.g. in the cloud as you want to distribute data across racks or availability zones to improve resiliency. This basically means you have one cluster where all shards participate in indexing and all shards can serve queries. You can generally not control with which shard is made primary or replica, but in most cases primary and replica shards do the same amount of work so that often does not matter.

I do however not understand how this solves or relate to the problems you initially described. Are you now considering just performing delta updates and avoiding full rebuilds? Could you please describe what you expect this setup to give you?

1 Like

One cluster, 16 nodes, two 'logical' zones: worker (indexing) and server (serving); 8 nodes each.
This cluster will also be separated into two availability zones A & B.
Till now we have 4 node groups:
(1) 4 nodes in worker zone, availability zone A
(2) 4 nodes in worker zone, availability zone B
(3) 4 nodes in server zone, availability zone A
(4) 4 nodes in server zone, availability zone B

  • full rebuild will be done via group (1) since our builder process runs there.
  • delta updates will be done directly on the server zone via group (3).
  1. I assume that if group (3) is completely unavailable, group (4) can serve and vice versa. That's why I want to ensure that each set of shared (aka replica) is available in each availability zone. Is it right?
  2. How can I ensure that?
  3. I don't care which shard is primary, I care about the performance when using two AZ in the same cluster. Can you list the cons please?

Thank you!

You can use shard allocation filtering to create the server and worker areas of the cluster. Then allocate indices to the correct area based on these parameters. An example of how this is used is when implementing a hot/warm architecture for logging use cases.

In order to split each area into two availability zones, you can use shard allocation awareness. This assumes that the availability zones have good bandwidth and low latency.

Whether the zones are able to index and serve data in the absence of a full availability zone will depend on you having enough master-eligible nodes to allow a master to be elected.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.