High JVM Heap usage in ES load balancer node and Connection Timeout from Client to ElasticSearch load balancer node

  • Search Guard and Elasticsearch version - 5.6.9
  • 32GB of max heap configured
  • Installed and used enterprise modules, if any - No
  • JVM version and operating system version - 1.8, CentOS
  • Other installed Elasticsearch or Kibana plugins, if any - None
    Configuration: Loadbalancer node and 3 data nodes.
    Loadbalancer: ES node for Querying and for data ingest.

Nginx between Client and ES cluster
50 clients connecting to a single Elasticsearch coordinating node every 10seconds

Problem Statement:
Client connections fail with the below exception

message: Error running query: ConnectionTimeout caused by - ConnectTimeout(HTTPSConnectionPool(host='xxxx', port=9200): Max retries exceeded with url: /metrics-*/_search?_source_include=Timestamp%2C%2A&ignore_unavailable=true&scroll=30s&size=10000 (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x2044e90>, 'Connection to xx.yy timed out. (connect timeout=20)')))
num_hits: 19
num_matches: 3
traceback: [
"Traceback (most recent call last):",
" File "/var/lib/elastalert/elastalert-0.1.29/elastalert/elastalert.py", line 390, in get_hits",
" **extra_args",
" File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped",
" return func(args, params=params, **kwargs)",
" File "/usr/lib/python2.7/site-packages/elasticsearch/client/init.py", line 623, in search",
" doc_type, '_search'), params=params, body=body)",
" File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 312, in perform_request",
" status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)",
" File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 84, in perform_request",
" raise ConnectionTimeout('TIMEOUT', str(e), e)",
"ConnectionTimeout: ConnectionTimeout caused by - ConnectTimeout(HTTPSConnectionPool(host='xx.yy', port=9200): Max retries exceeded with url: /metrics-
/_search?_source_include=Timestamp%2C%2A&ignore_unavailable=true&scroll=30s&size=10000 (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x2044e90>, 'Connection to xx.yy timed out. (connect timeout=20)')))"
]

Snapshot of JVM after searchguard install and connection timeout which surfaced.

What is the total heap memory allocated on the client node? What is the percentage usage? Are you seeing any Out of Memory errors in the ES coordinating node logs?

What type of queries are running and how long do they take? I believe the originating queries might be too many for a single coordinating node to handle rather than the heap pressure. Try increasing the number of coordinating nodes.

Hi Junaid,

Elastalert is the client where i am seeing the failures. Elastalert runs rules against indexes in elasticsearch.
I have approximately 100-110 rules file, which run queries on ES cluster(load balancer node)

Sample rule file:
index: elastalert_status
filter:

  • query_string:
    query: '_type:elastalert_error AND !message: "index pattern matches"'

The 110 rules run every 2 minute.
The ES coordinating node was working OK and running the queries successfully.
When i installed searchguard to enable SSL, the client timeout and JVM heap started to increase. Some rules run successfully and some rules fails. Which shows there is no issue with the SSL connection.Any thoughts from your side?

I will consider increasing the number of coordinating node, but its a challenge considering my current architecture.

Regarding HEAP usage, the HEAP goes high and the node becomes unresponsive. Thus i have introduced a script, which will restart ES service when the HEAP goes beyond 80-90%.

Below snapshot shows JVM heap before search guard installation(marked in red)

It looks more like an issue with search-guard plugin rather than Vanilla ES in that case. I think search-guard community forum will be a better place to ask for.

1 Like

Thanks a lot for your prompt response Junaid. Let me take this up in the search-guard forum

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.