Elasticsearch keeps timeout-ing in docker

For some reason, it looks like my ElasticSearch server starts timeouting as soon as there are 2 concurrent connections or more (and not doing more than ~3 queries per second).

Here's the error I keep getting:

llib3.py", line 122, in perform_request
    raise ConnectionTimeout('TIMEOUT', str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10))
"""

I'm using docker.elastic.co/elasticsearch/elasticsearch:5.3.0 with following config file

cluster.name: "docker-cluster"
network.host: 0.0.0.0

# minimum_master_nodes need to be explicitly set when bound on a public IP
# set to 1 to allow single node clusters
# Details: https://github.com/elastic/elasticsearch/pull/17288
discovery.zen.minimum_master_nodes: 1

# enable CORS
http.cors.enabled: true
http.cors.allow-origin: "*"

# increase query max length
indices.query.bool.max_clause_count: 100000

# disable auth
xpack.security.enabled: false

If there is only 1 concurrent connection to ES, I'm not getting any error.

Some logs:

elasticsearch_1  | [2017-04-28T11:16:52,718][WARN ][o.e.m.j.JvmGcMonitorService] [yfHJjxF] [gc][15976] overhead, spent [1.3m] collecting in the last [1.3m]
elasticsearch_1  | [2017-04-28T11:18:25,981][WARN ][o.e.m.j.JvmGcMonitorService] [yfHJjxF] [gc][old][15982][18] duration [1.4m], collections [1]/[1.4m], total [1.4m]/[18.5m], memory [1.9gb]->[1.8gb]/[1.9gb], all_pools {[young] [266.2mb]->[172.7mb]/[266.2mb]}{[survivor] [26.2mb]->[0b]/[33.2mb]}{[old] [1.6gb]->[1.6gb]/[1.6gb]}
elasticsearch_1  | [2017-04-28T11:18:26,126][WARN ][o.e.m.j.JvmGcMonitorService] [yfHJjxF] [gc][15982] overhead, spent [1.4m] collecting in the last [1.4m]
elasticsearch_1  | [2017-04-28T11:18:28,268][WARN ][o.e.m.j.JvmGcMonitorService] [yfHJjxF] [gc][15984] overhead, spent [829ms] collecting in the last [1.1s]

Does spent [1.4m] collecting in the last [1.4m] means it wasn't doing anything useful for the last 1.4 minute?

Yes, that means that the JVM spent nearly 100% of its time performing garbage collection.

If you have garbage collection ([gc]) happening that frequently, for that long (1.4 minutes!), then I'd say you have a memory pressure issue. The default 2G heap is not up to the tasks you're asking of it. You should increase that to as much as 50% of the available system memory, but no higher than 30g, e.g. if the system has 32G of RAM, then you could set the heap to 16G.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.