Basic "this week" query from kibana causes node CPU to go to 100% for 7-10 mins, heavy gc, unresponsive kibana

flavor8 · January 27, 2017, 9:39pm

I set up a new small cluster:

2 x r3.large boxes (2 cores, 15GB RAM). One is a master/data, and the second is data only. Heap is 8GB on both.

I reindexed a smallish (1.5GB) index from a cluster we're moving away from, as a starting point to test with.

In Kibana (installed on a 3rd server in the same cluster), if I go "discover" and set the time picker to "this week" (with index pattern picking up just the single index):

a) The query takes 7-10 minutes to run

b) The monitoring and management pages become unresponsive (in the same or a different tab, until the query has completed), dumping a bunch of "timed out" error messages after 30s

c) Only one node (it varies between them) seems to handle the query, and spits out a warning about gc overload every second for the duration; e.g.:
[2017-01-27T21:29:48,477][WARN ][o.e.m.j.JvmGcMonitorService] [node_001] [gc][2370] overhead, spent [622ms] collecting in the last [1s]
[2017-01-27T21:29:49,498][WARN ][o.e.m.j.JvmGcMonitorService] [node_001] [gc][2371] overhead, spent [601ms] collecting in the last [1s]
[2017-01-27T21:29:50,520][WARN ][o.e.m.j.JvmGcMonitorService] [node_001] [gc][2372] overhead, spent [607ms] collecting in the last [1s]

d) CPU on the node doing the work goes to 100% for the duration

Looking at the monitoring page in retrospect, there's no heap pressure at all (around 2GB used out of 8GB, perhaps expected given the index size).

What's going on with the cluster? How do I debug further? Obviously something is misconfigured; it seems that there should be plenty of margin.

flavor8 · January 27, 2017, 9:42pm

{
"cluster_name" : "staging",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 9,
"active_shards" : 18,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

system · February 24, 2017, 9:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sudden 100% CPU spike on a data node with Kibana becoming unresponsive Elasticsearch	2	1113	December 11, 2017
Spike in cluster CPU when viewing Discover page on Kibana Elasticsearch	1	359	August 14, 2018
Kibana4.1 tarda mucho en devolver las búsquedas Elastic en Español	7	1868	July 6, 2017
Kibana is going slow Kibana	12	1585	March 28, 2023
Performance issue on single node instance Elasticsearch	2	1238	April 17, 2017

Basic "this week" query from kibana causes node CPU to go to 100% for 7-10 mins, heavy gc, unresponsive kibana

Related topics