Hi,
I am using elastic version 7.11.2. I am observing that two nodes that contain an index of 102Gb (51+51), multiple DSL queries are running in the background resulting in high CPU utilization.
Enabled autoscaling on this cluster - but it doesn’t seem to autoscale.
Any thoughts on the above will be highly appreciable on how to mitigate the slow DSL query response and reduce CPU load.
Adding below logs of nodes,os,process,gc,hot_threads for ref.
GET /_cat/nodes?v=true
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
IP1 45 100 100 4.64 4.31 4.22 himrst - instance-0
IP2 75 100 100 2.98 2.96 2.81 himrst * instance-1
GET _cat/shards
index_name 0 r STARTED 153619336 51.1gb IP1 instance-0
index_name 0 p STARTED 153619336 51gb IP2 instance-1
GET _nodes/stats
"os" : {
"timestamp" : 1640075412812,
"cpu" : {
"percent" : 100,
"load_average" : {
"1m" : 5.09,
"5m" : 4.82,
"15m" : 4.66
}
},
"mem" : {
"total_in_bytes" : 8589934592,
"free_in_bytes" : 53248,
"used_in_bytes" : 8589881344,
"free_percent" : 0,
"used_percent" : 100
},
"process" : {
"timestamp" : 1640075412894,
"open_file_descriptors" : 1519,
"max_file_descriptors" : 1048576,
"cpu" : {
"percent" : 3,
"total_in_millis" : 1682949290
},
"mem" : {
"total_virtual_in_bytes" : 102227439616
}
},
"jvm" : {
"timestamp" : 1640075412902,
"uptime_in_millis" : 1519168014,
"mem" : {
"heap_used_in_bytes" : 2019352576,
"heap_used_percent" : 47,
"heap_committed_in_bytes" : 4294967296,
"heap_max_in_bytes" : 4294967296,
"non_heap_used_in_bytes" : 307331792,
"non_heap_committed_in_bytes" : 316825600,
"pools" : {
"young" : {
"used_in_bytes" : 406847488,
"max_in_bytes" : 0,
"peak_used_in_bytes" : 2573207552,
"peak_max_in_bytes" : 0
},
"old" : {
"used_in_bytes" : 1606213632,
"max_in_bytes" : 4294967296,
"peak_used_in_bytes" : 3262963712,
"peak_max_in_bytes" : 4294967296
},
"survivor" : {
"used_in_bytes" : 6291456,
"max_in_bytes" : 0,
"peak_used_in_bytes" : 322961408,
"peak_max_in_bytes" : 0
}
}
},
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 265734,
"collection_time_in_millis" : 41312494
},
"old" : {
"collection_count" : 0,
"collection_time_in_millis" : 0
}
}
},
::: {instance-0}{<..>}{<..>}{<..>}{<..>}{himrst}{logical_availability_zone=<>, server_name=instance-0.<>, availability_zone=<>, xpack.installed=true, data=hot, instance_configuration=<>, transform.node=true, region=<>}
Hot threads at 2021-12-21T08:20:30.432Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
27.9% (139.5ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0][search][T#3]'
5/10 snapshots sharing following 33 elements
app//org.elasticsearch.common.util.LongLongHash.add(LongLongHash.java:129)
app//org.elasticsearch.search.aggregations.bucket.terms.LongKeyedBucketOrds$FromMany.add(LongKeyedBucketOrds.java:199)
27.1% (135.7ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0][search][T#2]'
3/10 snapshots sharing following 29 elements
app//org.apache.lucene.search.ConjunctionDISI.nextDoc(ConjunctionDISI.java:253)
app//org.apache.lucene.search.DisjunctionDISIApproximation.nextDoc(DisjunctionDISIApproximation.java:55)
app//org.apache.lucene.search.ConjunctionDISI.nextDoc(ConjunctionDISI.java:253)
app//org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:254)
24.4% (121.9ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0][search][T#1]'
2/10 snapshots sharing following 60 elements
java.base@15.0.1/sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
java.base@15.0.1/sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:54)
java.base@15.0.1/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:274)
java.base@15.0.1/sun.nio.ch.IOUtil.read(IOUtil.java:245)
java.base@15.0.1/sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:815)
java.base@15.0.1/sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:800)
app//org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:170)
app//org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:152)
app//org.apache.lucene.store.DataInput.skipBytes(DataInput.java:338)
3.5% (17.6ms out of 500ms) cpu usage by thread 'elasticsearch[instance-1][snapshot][T#1]'
10/10 snapshots sharing following 59 elements
java.base@15.0.1/sun.nio.ch.Net.poll(Native Method)
Even if I check simple search queries in dev tools taking a lot of time.
GET index_name/_search
{
"took" : 32677,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}