Elasticsearch num processors/bulk threads detection

I looked into the code, Elasticsearch has a maximum limit of 32 for processor-based thread pools.

See org.elasticsearch.common.util.concurrent.EsExecutors

    /**
     * Returns the number of processors available but at most <tt>32</tt>.
     */
    public static int boundedNumberOfProcessors(Settings settings) {
        /* This relates to issues where machines with large number of cores
         * ie. >= 48 create too many threads and run into OOM see #3478
         * We just use an 32 core upper-bound here to not stress the system
         * too much with too many created threads */
        int defaultValue = Math.min(32, Runtime.getRuntime().availableProcessors());
        try {
            defaultValue = Integer.parseInt(System.getProperty(DEFAULT_SYSPROP));
        } catch (Throwable ignored) {}
        return settings.getAsInt(PROCESSORS, defaultValue);
    }

The reason for the limit is explained at

While I share the diagnosis, I do not follow why a local limit of 32 can ensure to prevent "thread explosions", since Elasticsearch has six thread pools which are proportional to the available processor count: INDEX, BULK, GET, SEARCH, SUGGEST, PERCOLATE. Plus, there are five thread pools that are 50% of the processor count, but not more than 5 or 10 threads: LISTENER, FLUSH, REFREH, WARMER, SNAPSHOT. And finally, there are two pools with double size of the processor count: FETCH_SHARD_STARTED and FETCH_SHARD_STORE. That can result in 632 + 35 + 210 + 22*32 = 192 + 15 + 20 + 128 = 335 threads in the ES pools on 32+ core machines. Plus, there are threads running by Netty. Anyway, we can conclude there are still enough ES/Netty threads in most situations that can be scheduled with maximum efficiency on any existing CPU hardware by the operating system.

Users with machines of >32 cores and a very, very specific or exotic workload pattern might want a deeper insight to the question if exceeding the pool size limit can also increase efficiency. The power of a single action type execution on a node is bound by a hard-coded maximum of 32 threads, but I doubt if that boundary is really a problem or can be measured at all.

My machines have a core count of 36 and 40 but maybe if I find time it is possible to run benchmarks with mixed search/index workload, enabling/disabling the hard-coded limit.

3 Likes