Replica and non-replica shards?


(Michael Li Zhou) #1

Has anyone tried to create an ES cluster where there are replica and non-replica shards? I am only working on 1 server so I don't see the point of having replicas. So I have no replicas of my data but I am about to store some important data that I will delete from a db and would now like to keep replica of only this data (basically move everything to ES). The rest I do not really care for.


(Magnus Bäck) #2

To use replicas you need more than one node in the cluster. ES will refuse to allocate both the primary shard and a replica shard on the same machine.

The number of replicas is an index setting so it's totally fine to have one or more replicas for index A but zero replicas for index B.


(Michael Li Zhou) #3

Understood. So now I wonder do people actually have 1 Node per machine? So if you have 20 nodes you have 20 servers. I only ask because I am just developing on 1 machine some nodes so I am aloud to do replicas when I need. I have the ability to get more blades but is it necessary?

Thanks,
M


(Nik Everett) #4

Only if you want to test replication stuff. In a production system having less than three machines is a poor idea because you can't do rolling restarts. In a development system you usually don't care about testing replication stuff. If you want to test replication then you can spin up three virtual machines or something.... If you really really want to test replication on a single node there are settings you can tweak that allow more than one shard per physical node but I don't know their names - you'd have to go digging. Its generally a bad idea any way.


(Magnus Bäck) #5

Understood. So now I wonder do people actually have 1 Node per machine? So if you have 20 nodes you have 20 servers.

That's the most common case, yes.

I only ask because I am just developing on 1 machine some nodes so I am aloud to do replicas when I need. I have the ability to get more blades but is it necessary?

Necessary for accomplishing what?


(Michael Li Zhou) #6

Well at first I just wanted 2 nodes on 1 machine just to get it off the yellow status if I were to have replicas. But my initial impression is more nodes means faster queries and ability to handle large loads. I want to be able to handle at least 8-10 GB worth of data a day. I was thinking one blade is enough to handle that and any replicas that occur. I think I should make a separate thread but I now wonder if having more then 2 ES nodes on a single blade has any point? Does it make quires quicker? Or handle larger amounts of data?

Thanks,
M


(Magnus Bäck) #7

Well at first I just wanted 2 nodes on 1 machine just to get it off the yellow status if I were to have replicas.

If the yellow cluster status bothers you you should disable the replicas, not start another ES node on the same box.

But my initial impression is more nodes means faster queries and ability to handle large loads.

It's not the number of ES processes that helps performance, it's the fact that it's run on separate hardware. One of the few times it makes sense to run multiple ES nodes on the same machine is if that machine has a lot of RAM, like 128 GB or more, and having a single JVM with a 64 GB heap wouldn't give the best performance.


(system) #8