I looked into the code, Elasticsearch has a maximum limit of 32 for processor-based thread pools.
See org.elasticsearch.common.util.concurrent.EsExecutors
/**
* Returns the number of processors available but at most <tt>32</tt>.
*/
public static int boundedNumberOfProcessors(Settings settings) {
/* This relates to issues where machines with large number of cores
* ie. >= 48 create too many threads and run into OOM see #3478
* We just use an 32 core upper-bound here to not stress the system
* too much with too many created threads */
int defaultValue = Math.min(32, Runtime.getRuntime().availableProcessors());
try {
defaultValue = Integer.parseInt(System.getProperty(DEFAULT_SYSPROP));
} catch (Throwable ignored) {}
return settings.getAsInt(PROCESSORS, defaultValue);
}
The reason for the limit is explained at
While I share the diagnosis, I do not follow why a local limit of 32 can ensure to prevent "thread explosions", since Elasticsearch has six thread pools which are proportional to the available processor count: INDEX, BULK, GET, SEARCH, SUGGEST, PERCOLATE. Plus, there are five thread pools that are 50% of the processor count, but not more than 5 or 10 threads: LISTENER, FLUSH, REFREH, WARMER, SNAPSHOT. And finally, there are two pools with double size of the processor count: FETCH_SHARD_STARTED and FETCH_SHARD_STORE. That can result in 632 + 35 + 210 + 22*32 = 192 + 15 + 20 + 128 = 335 threads in the ES pools on 32+ core machines. Plus, there are threads running by Netty. Anyway, we can conclude there are still enough ES/Netty threads in most situations that can be scheduled with maximum efficiency on any existing CPU hardware by the operating system.
Users with machines of >32 cores and a very, very specific or exotic workload pattern might want a deeper insight to the question if exceeding the pool size limit can also increase efficiency. The power of a single action type execution on a node is bound by a hard-coded maximum of 32 threads, but I doubt if that boundary is really a problem or can be measured at all.
My machines have a core count of 36 and 40 but maybe if I find time it is possible to run benchmarks with mixed search/index workload, enabling/disabling the hard-coded limit.