Ok, having reduced datadog to polling once every 30 seconds we are still seeing this issue (it happened again both last night and today).
running:
curl http://localhost:9200/_cluster/pending_tasks
returns:
{"tasks":[]}
running:
curl http://localhost:9200/_tasks
hangs (no response is seen after 10 minutes).
A truncated hot threads is shown below (this heavy usage happens across our servers although only one is shown in the truncated snippet below).
Hot threads at 2018-07-24T07:41:05.178Z, interval=500ms, busiestThreads=20, ignoreIdleThreads=true:
100.4% (502.1ms out of 500ms) cpu usage by thread 'elasticsearch[server1-1][management][T#1]'
3/10 snapshots sharing following 19 elements
java.util.TreeMap.getEntry(TreeMap.java:359)
java.util.TreeMap.get(TreeMap.java:278)
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms(PerFieldPostingsFormat.java:315)
org.elasticsearch.search.suggest.completion.CompletionFieldStats.completionStats(CompletionFieldStats.java:54)
org.elasticsearch.index.shard.IndexShard.completionStats(IndexShard.java:743)
org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:207)
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:163)
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47)
...
2/10 snapshots sharing following 38 elements
java.io.UnixFileSystem.canonicalize0(Native Method)
java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:172)
java.io.File.getCanonicalPath(File.java:618)
java.io.FilePermission$1.run(FilePermission.java:215)
java.io.FilePermission$1.run(FilePermission.java:203)
java.security.AccessController.doPrivileged(Native Method)
java.io.FilePermission.init(FilePermission.java:203)
java.io.FilePermission.<init>(FilePermission.java:277)
java.lang.SecurityManager.checkRead(SecurityManager.java:888)
sun.nio.fs.UnixPath.checkRead(UnixPath.java:795)
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:49)
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
java.nio.file.Files.readAttributes(Files.java:1737)
java.nio.file.Files.size(Files.java:2332)
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:67)
org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:67)
....
5/10 snapshots sharing following 13 elements
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:163)
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47)
org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:433)
org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:412)
org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:399)
org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)