We generally recommend scaling horizontally by adding small/medium machines. But we do have a few customers with big machines (eg. 512 RAM, 256 cores, etc..) who have deployed multiple Elasticsearch nodes per big machine. There are some caveats and recommendations if you choose this deployment architecture:
- Max heap size for each node instance should be < 32Gb. This is because a heap size above 32Gb will actually be counterproductive for the JVM will stop compressing pointers (https://wikis.oracle.com/display/HotSpotInternals/CompressedOops).
- Leave 50% of the memory to file system cache for Lucene.
- While you may have enough RAM to run multiple instances on the same machine, test to see if there is enough CPU/processing power.
- You will also want to check to make sure that the multiple node instances are not competing for disk space or disk IO. Our recommendation is to give all the nodes on the machine a raid0 with lots of disks underneath, or a dedicated disk per node.
- Lower
processors
setting (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#processors) accordingly. Each ES node detects the # of cores available on the machine (not aware of other nodes present). With multiple nodes on the same machine, each node can think that it has dedicated access to all cores on the machine (this can be problematic for the default thread pool sizes are derived from this). So you will want to explicitly specify the # of cores available via the processors setting so that it does not end up overallocating the thread pools. For example, roughly# of cores / # of nodes
can be a good start to configure for each node. - Keep in mind that multiple nodes also means that network connections, OS file descriptors, mmap file limits, will also be shared between the nodes (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html#setup-configuration) so you will want to make sure that there is enough bandwidth and the limits are set high enough to accommodate the nodes.
The more nodes you have on the machine, the more nodes will fail at once if a single server goes down. Also, you will want to make sure that you don’t end up with all copies of a shard on the same machine. You can prevent this by settingcluster.routing.allocation.same_shard.host to true. (See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#shards-allocation). - To ensure cluster stability, each dedicated master node instance should be on its own machine (certainly can be a much smaller machine, eg. 4Gb of RAM to start with)
- Keep in mind that multiple nodes on a machine means additional complexity in management (eg. keeping track of different ports, config files, etc..). A good way to manage the configuration for multiple instances is to create a separate elasticsearch.yml file per instance, eg. you can pass in the -Des.config parameter to specify the yml file for each instance on startup:
$ bin/elasticsearch -Des.config=$ES_HOME/config/elasticsearch.1.yml
$ bin/elasticsearch -Des.config=$ES_HOME/config/elasticsearch.2.yml
- Each yml will point to the same cluster name.
- It will be helpful to specify meaningful node names
- Use explicit port numbers for each node so that they are predictable (eg.
http.port
andtransport.tcp.port
). - Each node should have its own path.* directories (eg.
path.data
,path.log
,path.work
,path.plugins
) so the nodes will not end up having conflicting folder locations for data, plugins, logs, etc.. - Alternatively, when dividing up a large box, it may be cleaner to do so using VMs or control groups (CGroup) so that they don't compete for cpu, ram and file system cache is also independent.
- On the topic of file system cache, with the implementation of doc values to store fielddata values on disk (http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/doc-values.html#doc-values), you can also benefit from a larger file system cache. So for a 128Gb RAM box, it can make sense to have a < 32Gb heap and leave the rest to the FS cache.
Disclaimer: Testing is the only way to determine if your infrastructure can support multiple nodes on a single machine and the optimal # of node instances to use on the machine.