Creating a New Cluster -- To Use or Not to Use Virtualization?

We are in the process of securing COLO space to install new servers into a rack and to move our current Elasticsearch clusters into one larger cluster. I've read some material on virtualization and ES and I was hoping to get some feedback from some members of the ES development team.

I believe virtualization has its merits when deploying a large infrastructure, but I'm not quite sure virtualization is the best way to approach a new ES cluster. My thinking is that if one bare metal machine would otherwise house one ES node, then adding a virtualization layer between the bare metal and ES doesn't serve much purpose. Also, if a machine were to be divided up into multiple ES hosts, then redundancy planning would be complicated by having to configure around the possibility that a primary shard would live on the same bare metal server as its replica shard.

I would like to install ES directly on the O/S without any virtualization layer because I want the full resources of that machine to be one Elasticsearch node and I want to squeeze out every ounce of performance from the machine. However, others on the team feel that virtualization is a better plan regardless of how many nodes end up on one physical server.

What are the opinions of the ES development team when it comes to adding a virtualization layer into an ES cluster where multiple nodes could end up sharing the resources of one machine? Is it better to forgo the virtualization layer and use smaller machines (like in a blade configuration) to house each node?

Thanks for your time!

PS: This cluster will eventually span hundreds of nodes and contain petabytes of data so I wanted to take the best approach when building the foundation for this cluster so that we can scale out quickly and easily as well.

It Depends™ :wink: There are existing users successfully using virtualisation, and others equally successfully using bare-metal installations.

I could imagine some advantages when you look at the bigger picture. For instance if the rest of your infra is virtualised and your infra folks are more comfortable monitoring and managing virtualised hosts then IMO that's a very compelling reason to use virtualisation.

Conversely, virtualisation is usually pretty lightweight so it's hard to build a robust argument for a bare-metal installation on performance grounds alone. At least, not without doing some realistic benchmarking to show the difference it makes. And even then, the management advantages might well be worth a bit of performance overhead.

If you do run multiple nodes on each host then you will need some way to isolate them from each other, but you can achieve that with virtualisation or cgroups (or likely a bunch of other technologies too).

Thank you David. I always enjoy reading your responses. I probably was over-generalizing when I said adding a virtualization layer between the bare metal and ES doesn't serve much purpose. There are indeed pros and cons for taking either approach.

And you are correct, it does depend on the use case and a lot of other variables. I guess what I was looking for were some major "gotchas" to going down the virtualization path vs. bare-metal (with the consideration that only one node would be created on any bare-metal machine).

Thanks again for your input! Your last paragraph is especially important and my guess is that using a config parameter (like the rack config option) to let ES know which nodes are a part of the same physical server would keep the cluster from putting the primary and replica shards on the same physical machine.

I think my biggest argument against virtualization was the possibility of getting into a situation where redundancy is put in jeopardy from a bad config setting so if we go the route of virtualization we will have to pay special attention to where the shards eventually end up. If other organizations haven't seen much real-world degradation of performance using virtualization, I'm probably putting too much emphasis on that con.

Thank you!

PS: We currently are using Amazon for our Elasticsearch cluster but I wasn't very happy with how they treated your team so we're moving off their search product because I can't support or condone actions from service providers as big as Amazon when they end up treating open-source development teams and projects poorly. But that's another discussion for another day.

You and your team have made a wonderfully robust product that really adds a lot of power to any search based infrastructure so thank you again for all that you guys (and gals) do!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.