Windows Defrag cause node to become unresponsive?

I have been trying to track down an issue where we have nodes become unresponsive momentarily, causing the cluster to think they have left. I have assumed this to be a GC issue and have been trying to optimize to that end. I added a bunch of telemetry and while investigating I noticed something. I had five nodes that had network blips overnight. During this time I did not see any excessive heap use or GC. I was looking through the event viewer on the nodes and happened to notice the same error on all five nodes. It was:

Log Name: Application
Source: Microsoft-Windows-Defrag
Date: 2/17/2017 4:27:41 AM
Event ID: 257
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: ocv-es-17
The volume (C:) was not optimized because an error was encountered: Neither Slab Consolidation nor Slab Analysis will run if slabs are less than 8 MB. (0x8900002D)

It would seem to be unrelated, but on all four nodes, this error preceded the node dropping off by a 3 to 5 minutes:

Node	Down	Up	Window	Defrag	Leadtime
es-17	4:32:36	4:33:40	0:01:04	4:27:00	0:05:36
es-12	4:42:04	4:47:16	0:05:12	4:35:00	0:07:04
es-11	4:53:21	4:55:51	0:02:30	4:51:00	0:02:21
es-13	5:11:16	5:16:21	0:05:05	5:08:00	0:03:16
es-07	1:45:56	1:46:42	0:00:46	1:42:35	0:03:21

I don't see any other instances of the Defrag error this week.

Is there any way that Defrag could be causing problems with Elasticsearch?


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.