Cluster timeouts during various requests

Elasticsearch version (bin/elasticsearch --version):

5.6.3

Plugins installed: []
x-pack
mapper-size

JVM version (java -version):
Oracle JRE 1.8.0-102

OS version (uname -a if on a Unix-like system):
Debian Wheezy
Linux esdb2 4.4.82-ls-2 #2 SMP Tue Aug 22 16:46:15 CEST 2017 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Many requests timeout.
For example

  • mapping
  • allocation
  • snapshot create/restore

Provide logs (if relevant):

[2017-11-07T13:45:40,315][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [esdb2] failed to put mappings on indices [[[lslogbeat-2017.11.07/ybyWgKnaRd23NnBxbj7IOg]]], type [1510057956]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.3.jar:5.6.3]
    at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_102]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.6.3.jar:5.6.3]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.3.jar:5.6.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_102]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_102]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102]
[2017-11-07T13:45:40,315][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [esdb2] failed to put mappings on indices [[[lslogbeat-2017.11.07/ybyWgKnaRd23NnBxbj7IOg]]], type [1510057956]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.3.jar:5.6.3]
    at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_102]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.6.3.jar:5.6.3]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.3.jar:5.6.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_102]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_102]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102]
[2017-11-07T13:45:40,315][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [esdb2] failed to put mappings on indices [[[lslogbeat-2017.11.07/ybyWgKnaRd23NnBxbj7IOg]]], type [1510057956]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.3.jar:5.6.3]
    at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_102]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.6.3.jar:5.6.3]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.3.jar:5.6.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_102]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_102]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102]
[2017-11-07T13:45:40,315][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [esdb2] failed to put mappings on indices [[[lslogbeat-2017.11.07/ybyWgKnaRd23NnBxbj7IOg]]], type [1510057956]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.3.jar:5.6.3]
    at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_102]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.6.3.jar:5.6.3]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.3.jar:5.6.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_102]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_102]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102]
[2017-11-07T13:45:40,600][WARN ][o.e.c.s.ClusterService   ] [esdb2] cluster state update task [put-mapping[1510057717, 1510057717, 1510057717, 1510057717, 1510057717, 1510057717, 1510057717, 1510057717, 1510057954, 1510057954, 1510057954, 1510057971, 1510058317, 1510058318, 1510057972, 1510057953, 1510057973, 1510057976, 1510057954, 1510057954, 1510058191, 1510058191, 1510058098, 1510058099, 1510058192, 1510058193, 1510058196, 1510058312, 1510058313, 1510058087, 1510058197, 1510058318, 1510058088, 1510058319, 1510057728]] took [2m] above the warn threshold of 30s

What is the full output of the cluster stats API?

Thank you for reply. Stats output:

   {
    "_nodes": {
        "total": 6,
        "successful": 6,
        "failed": 0
    },
    "cluster_name": "esdb",
    "timestamp": 1510068170859,
    "status": "green",
    "indices": {
        "count": 484,
        "shards": {
            "total": 1934,
            "primaries": 1884,
            "replication": 0.02653927813163482,
            "index": {
                "shards": {
                    "min": 1,
                    "max": 5,
                    "avg": 3.9958677685950414
                },
                "primaries": {
                    "min": 1,
                    "max": 5,
                    "avg": 3.8925619834710745
                },
                "replication": {
                    "min": 0,
                    "max": 3,
                    "avg": 0.10330578512396695
                }
            }
        },
        "docs": {
            "count": 15340225039,
            "deleted": 92405
        },
        "store": {
            "size": "8.3tb",
            "size_in_bytes": 9233638509426,
            "throttle_time": "0s",
            "throttle_time_in_millis": 0
        },
        "fielddata": {
            "memory_size": "5.7mb",
            "memory_size_in_bytes": 6000848,
            "evictions": 0
        },
        "query_cache": {
            "memory_size": "3.9gb",
            "memory_size_in_bytes": 4278054627,
            "total_count": 24743883,
            "hit_count": 7023806,
            "miss_count": 17720077,
            "cache_size": 65992,
            "cache_count": 260237,
            "evictions": 194245
        },
        "completion": {
            "size": "0b",
            "size_in_bytes": 0
        },
        "segments": {
            "count": 44250,
            "memory": "15.7gb",
            "memory_in_bytes": 16954793886,
            "terms_memory": "13gb",
            "terms_memory_in_bytes": 14061932355,
            "stored_fields_memory": "1.9gb",
            "stored_fields_memory_in_bytes": 2063181288,
            "term_vectors_memory": "0b",
            "term_vectors_memory_in_bytes": 0,
            "norms_memory": "116.1kb",
            "norms_memory_in_bytes": 118976,
            "points_memory": "245.3mb",
            "points_memory_in_bytes": 257298999,
            "doc_values_memory": "545.7mb",
            "doc_values_memory_in_bytes": 572262268,
            "index_writer_memory": "17.6mb",
            "index_writer_memory_in_bytes": 18497368,
            "version_map_memory": "798.1kb",
            "version_map_memory_in_bytes": 817334,
            "fixed_bit_set": "1kb",
            "fixed_bit_set_memory_in_bytes": 1080,
            "max_unsafe_auto_id_timestamp": 1510012814788,
            "file_sizes": {}
        }
    },
    "nodes": {
        "count": {
            "total": 6,
            "data": 6,
            "coordinating_only": 0,
            "master": 6,
            "ingest": 6
        },
        "versions": [
            "5.6.3"
        ],
        "os": {
            "available_processors": 208,
            "allocated_processors": 240,
            "names": [
                {
                    "name": "Linux",
                    "count": 6
                }
            ],
            "mem": {
                "total": "375.8gb",
                "total_in_bytes": 403552321536,
                "free": "7.1gb",
                "free_in_bytes": 7722110976,
                "used": "368.6gb",
                "used_in_bytes": 395830210560,
                "free_percent": 2,
                "used_percent": 98
            }
        },
        "process": {
            "cpu": {
                "percent": 8
            },
            "open_file_descriptors": {
                "min": 1155,
                "max": 2558,
                "avg": 1652
            }
        },
        "jvm": {
            "max_uptime": "8.2d",
            "max_uptime_in_millis": 711771755,
            "versions": [
                {
                    "version": "1.8.0_102",
                    "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
                    "vm_version": "25.102-b14",
                    "vm_vendor": "Oracle Corporation",
                    "count": 6
                }
            ],
            "mem": {
                "heap_used": "96.9gb",
                "heap_used_in_bytes": 104082771384,
                "heap_max": "178.7gb",
                "heap_max_in_bytes": 191982469120
            },
            "threads": 2575
        },
        "fs": {
            "total": "27.2tb",
            "total_in_bytes": 29922098896896,
            "free": "17.5tb",
            "free_in_bytes": 19297268899840,
            "available": "17.5tb",
            "available_in_bytes": 19297268899840,
            "spins": "true"
        },
        "plugins": [
            {
                "name": "x-pack",
                "version": "5.6.3",
                "description": "Elasticsearch Expanded Pack Plugin",
                "classname": "org.elasticsearch.xpack.XPackPlugin",
                "has_native_controller": true
            },
            {
                "name": "mapper-size",
                "version": "5.6.3",
                "description": "The Mapper Size plugin allows document to record their uncompressed size at index time.",
                "classname": "org.elasticsearch.plugin.mapper.MapperSizePlugin",
                "has_native_controller": false
            }
        ],
        "network_types": {
            "transport_types": {
                "netty4": 6
            },
            "http_types": {
                "netty4": 6
            }
        }
    }
}

Are you using dynamic mappings? What is the size of your mappings?

What do you mean by "size" of mappings ?

And yes in most cases. We are using es for storing logs from syslog and logstash.
So there is many dynamic mappings

All timeouts in the cluster was becouse of some buggy beats daemon. Propably some bug which overload cluster with some garbage. I stopped all beats and everything is ok. Syslog logstashes fill data with no problem

So I'm closing this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.