I have been trying to track down an issue where we have nodes become unresponsive momentarily, causing the cluster to think they have left. I have assumed this to be a GC issue and have been trying to optimize to that end. I added a bunch of telemetry and while investigating I noticed something. I had five nodes that had network blips overnight. During this time I did not see any excessive heap use or GC. I was looking through the event viewer on the nodes and happened to notice the same error on all five nodes. It was:
Log Name: Application
Source: Microsoft-Windows-Defrag
Date: 2/17/2017 4:27:41 AM
Event ID: 257
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: ocv-es-17
Description:
The volume (C:) was not optimized because an error was encountered: Neither Slab Consolidation nor Slab Analysis will run if slabs are less than 8 MB. (0x8900002D)
It would seem to be unrelated, but on all four nodes, this error preceded the node dropping off by a 3 to 5 minutes:
Node Down Up Window Defrag Leadtime
es-17 4:32:36 4:33:40 0:01:04 4:27:00 0:05:36
es-12 4:42:04 4:47:16 0:05:12 4:35:00 0:07:04
es-11 4:53:21 4:55:51 0:02:30 4:51:00 0:02:21
es-13 5:11:16 5:16:21 0:05:05 5:08:00 0:03:16
es-07 1:45:56 1:46:42 0:00:46 1:42:35 0:03:21
I don't see any other instances of the Defrag error this week.
Is there any way that Defrag could be causing problems with Elasticsearch?
Thanks,
~john