Elasticsearch on Docker

We are building a new Elasticsearch cluster and trying to figure out one thing - is running Elasticsearch in a Docker Container on each node adding another level of I/O impact?

Not only is there I/O impact due to running a VM on top of a physical host where the I/O needs to traverse the hypervisor, but wondering if adding Docker adds more of that held up I/O due to the Docker engine?

Of course benchmarks could be done but I was wondering if the Elastic staff and/or everyone else on this forum could provide input?

Thank you!

You seem to have little or no knowledge about Docker or containers, because you assume a VM and "the I/O needs to traverse the hypervisor".

The truth is, there is no VM, and no hypervisor in container technology. In fact, containers are an alternative to hypervisor architecture.

If you want to find out more about Docker network latency degradation, read e.g.

http://delaat.net/rp/2014-2015/p92/report.pdf

Quote:

"Therefore, the results are not scientifically significant to prove that there is indeed a performance degradation"

Jorg,

I am fully aware that there is no hypervisor in container technology, but there is a hypervisor when using Hyper-V or VMware. The decision I am trying to make as a result of the research I am trying to do, is decide whether to run Elasticsearch right on the VM's or if there is no performance impact, run them on VM's but inside Docker containers on those VM's.

Does that make sense?

I'm opinionated in regard to VMs. I don't use VMs because I have enough barebone machines available. The Elasticsearch JVM is already a kind of a VM. So just use VMs if you have to. If you want the best ratio of performance in relation to the the invested money into hardware, VMs are not a good choice. Because hardware is so cheap, people stop caring and use VMs - it's their decision which might be economic, but not the best for folks like me who care for optimal performance, tuning, and maximum resource utilization.

Docker containerization takes the burden from the developer how to manage development/test/production environments for deploying software to anywhere from laptop to the cloud. So the container encapsulates all what is required to run a specific Elasticsearch setup.

So in the end there is no exact answer. Try and measure for yourself. You know best what you expect from the system, maybe a certain number of documents indexed per timeframe, queries per second, whatever. Elasticsearch helps you because it scales horizontally over the number of machines, by just starting another node. You can combine all the different VMs you want to examine with and without Docker for test runs, and if the key indicators you want to see match your expectations, then you should go for that.