Elastic Search Goes Red

adwaitjoshi · July 14, 2018, 5:33pm

I have a single node elastic cluster with replication turned off. I have significant amount of data in the system (about 300MM records). When I load my Visualizations they load but are slow. However when I try to open dashboards, they not only take much longer (I understand due to multiple connections to the underlying visualizations) but they also error out and put ElasticSearch in a red status. I have a pretty powerful system with 8C/64GB/4.8TB SSD. What part of configuration would I have to change in order to fix this issue, I am assuming that 300MM is not too much for Elastic to handle on one node. So something else must be wrong. Any advice is welcome.

Christian_Dahlqvist · July 14, 2018, 7:07pm

What is the output of the cluster stats API?

adwaitjoshi · July 14, 2018, 8:52pm

{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "dataseers",
"timestamp": 1531601516167,
"status": "yellow",
"indices": {
"count": 33,
"shards": {
"total": 45,
"primaries": 45,
"replication": 0,
"index": {
"shards": {
"min": 1,
"max": 5,
"avg": 1.3636363636363635
},
"primaries": {
"min": 1,
"max": 5,
"avg": 1.3636363636363635
},
"replication": {
"min": 0,
"max": 0,
"avg": 0
}
}
},
"docs": {
"count": 390889190,
"deleted": 61556174
},
"store": {
"size": "337.8gb",
"size_in_bytes": 362790191389
},
"fielddata": {
"memory_size": "5.5mb",
"memory_size_in_bytes": 5794136,
"evictions": 0
},
"query_cache": {
"memory_size": "0b",
"memory_size_in_bytes": 0,
"total_count": 0,
"hit_count": 0,
"miss_count": 0,
"cache_size": 0,
"cache_count": 0,
"evictions": 0
},
"completion": {
"size": "0b",
"size_in_bytes": 0
},
"segments": {
"count": 566,
"memory": "809.3mb",
"memory_in_bytes": 848656687,
"terms_memory": "650mb",
"terms_memory_in_bytes": 681591121,
"stored_fields_memory": "118.8mb",
"stored_fields_memory_in_bytes": 124644680,
"term_vectors_memory": "0b",
"term_vectors_memory_in_bytes": 0,
"norms_memory": "1mb",
"norms_memory_in_bytes": 1053824,
"points_memory": "34.9mb",
"points_memory_in_bytes": 36686758,
"doc_values_memory": "4.4mb",
"doc_values_memory_in_bytes": 4680304,
"index_writer_memory": "4.9mb",
"index_writer_memory_in_bytes": 5165080,
"version_map_memory": "1.5mb",
"version_map_memory_in_bytes": 1654897,
"fixed_bit_set": "12.6kb",
"fixed_bit_set_memory_in_bytes": 12960,
"max_unsafe_auto_id_timestamp": 1531531162502,
"file_sizes": {}
}
},
"nodes": {
"count": {
"total": 1,
"data": 1,
"coordinating_only": 0,
"master": 1,
"ingest": 1
},
"versions": [
"6.3.1"
],
"os": {
"available_processors": 16,
"allocated_processors": 16,
"names": [
{
"name": "Linux",
"count": 1
}
],
"mem": {
"total": "62.5gb",
"total_in_bytes": 67126927360,
"free": "37.9gb",
"free_in_bytes": 40721190912,
"used": "24.5gb",
"used_in_bytes": 26405736448,
"free_percent": 61,
"used_percent": 39
}
},
"process": {
"cpu": {
"percent": 0
},
"open_file_descriptors": {
"min": 927,
"max": 927,
"avg": 927
}
},
"jvm": {
"max_uptime": "19.5h",
"max_uptime_in_millis": 70368444,
"versions": [
{
"version": "1.8.0_171",
"vm_name": "OpenJDK 64-Bit Server VM",
"vm_version": "25.171-b10",
"vm_vendor": "Oracle Corporation",
"count": 1
}
],
"mem": {
"heap_used": "4.6gb",
"heap_used_in_bytes": 4962910312,
"heap_max": "7.8gb",
"heap_max_in_bytes": 8476557312
},
"threads": 229
},
"fs": {
"total": "4.2tb",
"total_in_bytes": 4708418715648,
"free": "3.7tb",
"free_in_bytes": 4165199818752,
"available": "3.7tb",
"available_in_bytes": 4165199818752
},
"plugins": [],
"network_types": {
"transport_types": {
"security4": 1
},
"http_types": {
"security4": 1
}
}
}
}

Christian_Dahlqvist · July 14, 2018, 9:36pm

That looks fine. Is there anything in the Elasticsearch logs?

adwaitjoshi · July 15, 2018, 2:37pm

Unfortunately it has not broken since I posted this. I am trying to see if multiple users hitting the system at the same time is causing it. I will post the logs as soon as it fails.

adwaitjoshi · July 25, 2018, 2:27pm

Broke again

[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_logstash_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_kibana_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_nodes], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_xpack_license_expiration], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_cluster_status], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.m.c.n.NodeStatsCollector] [node-1] collector [node_stats] timed out when collecting data
[2018-07-25T10:02:46,076][WARN ][o.e.x.w.e.ExecutionService] [node-1] failed to execute watch [J8OVNfqLSe2NW0TOJPOhzw_logstash_version_mismatch]
[2018-07-25T10:02:46,086][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][1050][14] duration [26.5s], collections [1]/[27.1s], total [26.5s]/[4.8m], memory [15.8gb]->[15.8gb]/[15.8gb], all_pools {[young] [865.3mb]->[865.3mb]/[865.3mb]}{[survivor] [51.3mb]->[57.2mb]/[108.1mb]}{[old] [14.9gb]->[14.9gb]/[14.9gb]}
[2018-07-25T10:02:46,086][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1050] overhead, spent [26.5s] collecting in the last [27.1s]
[2018-07-25T10:03:09,939][ERROR][o.e.x.m.c.i.IndexStatsCollector] [node-1] collector [index-stats] timed out when collecting data
[2018-07-25T10:03:09,946][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][1051][15] duration [23.5s], collections [1]/[23.8s], total [23.5s]/[5.2m], memory [15.8gb]->[15.8gb]/[15.8gb], all_pools {[young] [865.3mb]->[865.3mb]/[865.3mb]}{[survivor] [57.2mb]->[74.2mb]/[108.1mb]}{[old] [14.9gb]->[14.9gb]/[14.9gb]}
[2018-07-25T10:03:09,946][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1051] overhead, spent [23.5s] collecting in the last [23.8s]
[2018-07-25T10:03:36,701][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_kibana_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,701][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_logstash_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,701][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_xpack_license_expiration], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,702][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_nodes], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,702][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,703][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [node-1] collector [cluster_stats] timed out when collecting data
[2018-07-25T10:03:36,702][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_cluster_status], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,704][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][1052][16] duration [26.5s], collections [1]/[26.7s], total [26.5s]/[5.6m], memory [15.8gb]->[15.8gb]/[15.8gb], all_pools {[young] [865.3mb]->[865.3mb]/[865.3mb]}{[survivor] [74.2mb]->[84.8mb]/[108.1mb]}{[old] [14.9gb]->[14.9gb]/[14.9gb]}
[2018-07-25T10:03:36,705][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1052] overhead, spent [26.5s] collecting in the last [26.7s]

system · August 22, 2018, 2:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch 1.5 cluster going to RED state due to some nodes constantly exiting and rejoining the cluster Elasticsearch	4	1065	July 5, 2017
Mysterious "red" cluster status has happened ~4x now Elasticsearch	1	301	July 6, 2017
Red status elastic Elasticsearch	10	1473	December 6, 2017
Elasticsearch Cluster Status is RED Elasticsearch elastic-stack-monitoring	12	725	June 29, 2021
Status red Elasticsearch	3	301	July 6, 2017

Elastic Search Goes Red

Related topics