We are building a new Elasticsearch cluster and trying to figure out one thing - is running Elasticsearch in a Docker Container on each node adding another level of I/O impact?
Not only is there I/O impact due to running a VM on top of a physical host where the I/O needs to traverse the hypervisor, but wondering if adding Docker adds more of that held up I/O due to the Docker engine?
Of course benchmarks could be done but I was wondering if the Elastic staff and/or everyone else on this forum could provide input?
I am fully aware that there is no hypervisor in container technology, but there is a hypervisor when using Hyper-V or VMware. The decision I am trying to make as a result of the research I am trying to do, is decide whether to run Elasticsearch right on the VM's or if there is no performance impact, run them on VM's but inside Docker containers on those VM's.
I'm opinionated in regard to VMs. I don't use VMs because I have enough barebone machines available. The Elasticsearch JVM is already a kind of a VM. So just use VMs if you have to. If you want the best ratio of performance in relation to the the invested money into hardware, VMs are not a good choice. Because hardware is so cheap, people stop caring and use VMs - it's their decision which might be economic, but not the best for folks like me who care for optimal performance, tuning, and maximum resource utilization.
Docker containerization takes the burden from the developer how to manage development/test/production environments for deploying software to anywhere from laptop to the cloud. So the container encapsulates all what is required to run a specific Elasticsearch setup.
So in the end there is no exact answer. Try and measure for yourself. You know best what you expect from the system, maybe a certain number of documents indexed per timeframe, queries per second, whatever. Elasticsearch helps you because it scales horizontally over the number of machines, by just starting another node. You can combine all the different VMs you want to examine with and without Docker for test runs, and if the key indicators you want to see match your expectations, then you should go for that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.