Elastic Search Goes Red

I have a single node elastic cluster with replication turned off. I have significant amount of data in the system (about 300MM records). When I load my Visualizations they load but are slow. However when I try to open dashboards, they not only take much longer (I understand due to multiple connections to the underlying visualizations) but they also error out and put ElasticSearch in a red status. I have a pretty powerful system with 8C/64GB/4.8TB SSD. What part of configuration would I have to change in order to fix this issue, I am assuming that 300MM is not too much for Elastic to handle on one node. So something else must be wrong. Any advice is welcome.

What is the output of the cluster stats API?

{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "dataseers",
"timestamp": 1531601516167,
"status": "yellow",
"indices": {
"count": 33,
"shards": {
"total": 45,
"primaries": 45,
"replication": 0,
"index": {
"shards": {
"min": 1,
"max": 5,
"avg": 1.3636363636363635
},
"primaries": {
"min": 1,
"max": 5,
"avg": 1.3636363636363635
},
"replication": {
"min": 0,
"max": 0,
"avg": 0
}
}
},
"docs": {
"count": 390889190,
"deleted": 61556174
},
"store": {
"size": "337.8gb",
"size_in_bytes": 362790191389
},
"fielddata": {
"memory_size": "5.5mb",
"memory_size_in_bytes": 5794136,
"evictions": 0
},
"query_cache": {
"memory_size": "0b",
"memory_size_in_bytes": 0,
"total_count": 0,
"hit_count": 0,
"miss_count": 0,
"cache_size": 0,
"cache_count": 0,
"evictions": 0
},
"completion": {
"size": "0b",
"size_in_bytes": 0
},
"segments": {
"count": 566,
"memory": "809.3mb",
"memory_in_bytes": 848656687,
"terms_memory": "650mb",
"terms_memory_in_bytes": 681591121,
"stored_fields_memory": "118.8mb",
"stored_fields_memory_in_bytes": 124644680,
"term_vectors_memory": "0b",
"term_vectors_memory_in_bytes": 0,
"norms_memory": "1mb",
"norms_memory_in_bytes": 1053824,
"points_memory": "34.9mb",
"points_memory_in_bytes": 36686758,
"doc_values_memory": "4.4mb",
"doc_values_memory_in_bytes": 4680304,
"index_writer_memory": "4.9mb",
"index_writer_memory_in_bytes": 5165080,
"version_map_memory": "1.5mb",
"version_map_memory_in_bytes": 1654897,
"fixed_bit_set": "12.6kb",
"fixed_bit_set_memory_in_bytes": 12960,
"max_unsafe_auto_id_timestamp": 1531531162502,
"file_sizes": {}
}
},
"nodes": {
"count": {
"total": 1,
"data": 1,
"coordinating_only": 0,
"master": 1,
"ingest": 1
},
"versions": [
"6.3.1"
],
"os": {
"available_processors": 16,
"allocated_processors": 16,
"names": [
{
"name": "Linux",
"count": 1
}
],
"mem": {
"total": "62.5gb",
"total_in_bytes": 67126927360,
"free": "37.9gb",
"free_in_bytes": 40721190912,
"used": "24.5gb",
"used_in_bytes": 26405736448,
"free_percent": 61,
"used_percent": 39
}
},
"process": {
"cpu": {
"percent": 0
},
"open_file_descriptors": {
"min": 927,
"max": 927,
"avg": 927
}
},
"jvm": {
"max_uptime": "19.5h",
"max_uptime_in_millis": 70368444,
"versions": [
{
"version": "1.8.0_171",
"vm_name": "OpenJDK 64-Bit Server VM",
"vm_version": "25.171-b10",
"vm_vendor": "Oracle Corporation",
"count": 1
}
],
"mem": {
"heap_used": "4.6gb",
"heap_used_in_bytes": 4962910312,
"heap_max": "7.8gb",
"heap_max_in_bytes": 8476557312
},
"threads": 229
},
"fs": {
"total": "4.2tb",
"total_in_bytes": 4708418715648,
"free": "3.7tb",
"free_in_bytes": 4165199818752,
"available": "3.7tb",
"available_in_bytes": 4165199818752
},
"plugins": [],
"network_types": {
"transport_types": {
"security4": 1
},
"http_types": {
"security4": 1
}
}
}
}

That looks fine. Is there anything in the Elasticsearch logs?

Unfortunately it has not broken since I posted this. I am trying to see if multiple users hitting the system at the same time is causing it. I will post the logs as soon as it fails.

Broke again

[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_logstash_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_kibana_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_nodes], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_xpack_license_expiration], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_cluster_status], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:02:46,072][ERROR][o.e.x.m.c.n.NodeStatsCollector] [node-1] collector [node_stats] timed out when collecting data
[2018-07-25T10:02:46,076][WARN ][o.e.x.w.e.ExecutionService] [node-1] failed to execute watch [J8OVNfqLSe2NW0TOJPOhzw_logstash_version_mismatch]
[2018-07-25T10:02:46,086][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][1050][14] duration [26.5s], collections [1]/[27.1s], total [26.5s]/[4.8m], memory [15.8gb]->[15.8gb]/[15.8gb], all_pools {[young] [865.3mb]->[865.3mb]/[865.3mb]}{[survivor] [51.3mb]->[57.2mb]/[108.1mb]}{[old] [14.9gb]->[14.9gb]/[14.9gb]}
[2018-07-25T10:02:46,086][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1050] overhead, spent [26.5s] collecting in the last [27.1s]
[2018-07-25T10:03:09,939][ERROR][o.e.x.m.c.i.IndexStatsCollector] [node-1] collector [index-stats] timed out when collecting data
[2018-07-25T10:03:09,946][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][1051][15] duration [23.5s], collections [1]/[23.8s], total [23.5s]/[5.2m], memory [15.8gb]->[15.8gb]/[15.8gb], all_pools {[young] [865.3mb]->[865.3mb]/[865.3mb]}{[survivor] [57.2mb]->[74.2mb]/[108.1mb]}{[old] [14.9gb]->[14.9gb]/[14.9gb]}
[2018-07-25T10:03:09,946][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1051] overhead, spent [23.5s] collecting in the last [23.8s]
[2018-07-25T10:03:36,701][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_kibana_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,701][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_logstash_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,701][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_xpack_license_expiration], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,702][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_nodes], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,702][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_version_mismatch], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,703][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [node-1] collector [cluster_stats] timed out when collecting data
[2018-07-25T10:03:36,702][ERROR][o.e.x.w.i.s.ExecutableSearchInput] [node-1] failed to execute [search] input for watch [J8OVNfqLSe2NW0TOJPOhzw_elasticsearch_cluster_status], reason [java.util.concurrent.TimeoutException: Timeout waiting for task.]
[2018-07-25T10:03:36,704][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][1052][16] duration [26.5s], collections [1]/[26.7s], total [26.5s]/[5.6m], memory [15.8gb]->[15.8gb]/[15.8gb], all_pools {[young] [865.3mb]->[865.3mb]/[865.3mb]}{[survivor] [74.2mb]->[84.8mb]/[108.1mb]}{[old] [14.9gb]->[14.9gb]/[14.9gb]}
[2018-07-25T10:03:36,705][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1052] overhead, spent [26.5s] collecting in the last [26.7s]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.