Moving big indices around with two or more instances running in the same box, working separately

fforbeck · January 20, 2016, 6:43pm

Hi there,

I would like an opinion about this scenario:

4 ES instances, one node each, running in the same box but using different cluster names.
So they won't be working together. But they all will store same type of data.
A proxy running on top of these instances to route my indices to the right node.
ZSF snapshot as backup

I am not complete aware of the ES capabilities and the main reason here would be easily move an entire index to other new servers, as it grows.

How is it sound for you?

Thanks

warkolm · January 20, 2016, 9:13pm

Why not just cluster the new node to the old one, then use shard filtering to shift the data across.

fforbeck · January 20, 2016, 9:36pm

I did not know about this, but one question.
With this approach, I would not have to declare my indices in the file?
Because my indices are generated on the fly, so do not know their names before their creation.

warkolm · January 20, 2016, 9:38pm

In what file?

fforbeck · January 20, 2016, 11:53pm

Sorry, I mean, not necessarily in the conf file.
I would have to send a PUT for each new index I have created on the fly.

for instance, new index 1000
PUT 1000/_settings
{
"index.routing.allocation.include._name": "my-node-A"
}
...

Then, sending my bulk request and repeat this process for each new index passing the correct node name.

Currently I have something around 6K indices and it will get much bigger.

warkolm · January 21, 2016, 1:11am

I don't understand why you'd want to move stuff around, nor why you have this stand alone "cluster" setup.

Christian_Dahlqvist · January 21, 2016, 6:59am

I must admit that I also do not understand exactly what you are trying to achieve. The fact that you say that you have 6000 indices in a cluster that size and expect that to grow is however a concern. Each shard in Elasticsearch is a Lucene index and has some resource overhead (memory, file handles, CPU) associated with it. Having that many indices and shards will unnecessarily use up a lot of system resources and is likely to not scale well. Why are you having so many indices?

fforbeck · January 21, 2016, 12:49pm

Here is my case:

One client can archive many websites.
Each archived website has a unique id.
For each time the same client archives the same website, I have a different snapshot id.
The document content is the website html/css/js/etc
The documents are stored following this path:
/website_id/snapshot_id/document_id
Each new index is dedicated to the snapshots from a particular website.
Each website have its own index.

That's why so many indices.

Now, about the stand alone instances. The idea was proposed in order to facilitate the data migration to another server as it grows. For instance, move one entire index data to a another dedicated server. As the data was stored in only one node we could easily move the entire index.

I am trying to understand if this would be viable. That is why I would like to hear from you guys and know if there is other ways achieve that.

Thanks a lot

warkolm · January 22, 2016, 3:28am

Why not have a cluster, then when you need, add larger nodes in and use filtering to move data off the smaller nodes.

Christian_Dahlqvist · January 22, 2016, 8:18am

Having a separate index per website wastes a lot of resources and will not scale. If the structure of the documents you are indexing for the different website are similar and/or you can control the mappings, I would recommend storing multiple, if not all, websites in a single index. You can then either add filters at the application layer or use filtered aliases when accessing the data. If you always query per website, you can also use routing in order to ensure all documents belonging to a single website reside in a single shard, which can improve query latency and throughput.

Topic		Replies	Views
How to migrate index Elasticsearch	8	1623	July 5, 2017
Using the internal "transport module" for moving data between clusters Elasticsearch	6	416	July 6, 2017
Moving data from multi node to single node cluster Elasticsearch	1	942	July 5, 2017
Need help: Can't migrate indices between clusters :( Elasticsearch	3	325	July 6, 2017
Few queries on setting up a high performing and scalable ES setup Elasticsearch	3	349	July 6, 2017

Moving big indices around with two or more instances running in the same box, working separately

Related topics