High CPU utilization on master node

guptaparv-rlr · December 2, 2021, 2:50pm

I recycled the pods and ran the process again. Can you please have a look at the logs again?

gist.github.com

https://gist.github.com/guptaparv-rlr/f7ce53f87e0d394205fa935011425fa4

master-0.json

{"type": "server", "timestamp": "2021-12-02T12:33:22,511Z", "level": "DEBUG", "component": "o.e.r.b.BlobStoreRepository", "cluster.name": "eck", "node.name": "eck-es-master-0", "message": "Updated repository generation from [216] to [217]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "NqnAlbmwQ6KmbkC0DEfdFw"  }

master-1.json

{"type": "server", "timestamp": "2021-12-02T12:27:18,447Z", "level": "TRACE", "component": "o.e.r.a.AzureBlobContainer", "cluster.name": "eck", "node.name": "eck-es-master-1", "message": "readBlob(snap-zRtgKX7nReCDdjtpvYv6MQ.dat) from position [0] with length [unlimited]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "yHehmN5jSlK7_YWh2P3eOA"  }
{"type": "server", "timestamp": "2021-12-02T12:27:18,447Z", "level": "TRACE", "component": "o.e.r.a.AzureBlobStore", "cluster.name": "eck", "node.name": "eck-es-master-1", "message": "reading container [device-logs], blob [snap-zRtgKX7nReCDdjtpvYv6MQ.dat]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "yHehmN5jSlK7_YWh2P3eOA"  }
{"type": "server", "timestamp": "2021-12-02T12:27:18,447Z", "level": "TRACE", "component": "o.e.r.a.AzureBlobContainer", "cluster.name": "eck", "node.name": "eck-es-master-1", "message": "readBlob(snap-Fg5aeoFVTkmXTPwyFC_LqA.dat) from position [0] with length [unlimited]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "yHehmN5jSlK7_YWh2P3eOA"  }
{"type": "server", "timestamp": "2021-12-02T12:27:18,447Z", "level": "TRACE", "component": "o.e.r.a.AzureBlobStore", "cluster.name": "eck", "node.name": "eck-es-master-1", "message": "reading container [device-logs], blob [snap-Fg5aeoFVTkmXTPwyFC_LqA.dat]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "yHehmN5jSlK7_YWh2P3eOA"  }
{"type": "server", "timestamp": "2021-12-02T12:27:18,448Z", "level": "TRACE", "component": "o.e.r.a.AzureBlobContainer", "cluster.name": "eck", "node.name": "eck-es-master-1", "message": "readBlob(snap-0BctTRQiSXioKnCyiGruaw.dat) from position [0] with length [unlimited]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "yHehmN5jSlK7_YWh2P3eOA"  }
{"type": "server", "timestamp": "2021-12-02T12:27:18,448Z", "level": "TRACE", "component": "o.e.r.a.AzureBlobStore", "cluster.name": "eck", "node.name": "eck-es-master-1", "message": "reading container [device-logs], blob [snap-0BctTRQiSXioKnCyiGruaw.dat]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "yHehmN5jSlK7_YWh2P3eOA"  }
{"type": "server", "timestamp": "2021-12-02T12:27:18,448Z", "level": "TRACE", "component": "o.e.r.a.AzureBlobContainer", "cluster.name": "eck", "node.name": "eck-es-master-1", "message": "readBlob(snap-cIyXfQdZSj20_tGvT3r0oQ.dat) from position [0] with length [unlimited]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "yHehmN5jSlK7_YWh2P3eOA"  }
{"type": "server", "timestamp": "2021-12-02T12:27:18,448Z", "level": "TRACE", "component": "o.e.r.a.AzureBlobStore", "cluster.name": "eck", "node.name": "eck-es-master-1", "message": "reading container [device-logs], blob [snap-cIyXfQdZSj20_tGvT3r0oQ.dat]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "yHehmN5jSlK7_YWh2P3eOA"  }
{"type": "server", "timestamp": "2021-12-02T12:27:18,455Z", "level": "TRACE", "component": "o.e.r.a.AzureBlobContainer", "cluster.name": "eck", "node.name": "eck-es-master-1", "message": "readBlob(snap-IjyNMCFzQrCRDtxKxkHlvQ.dat) from position [0] with length [unlimited]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "yHehmN5jSlK7_YWh2P3eOA"  }
{"type": "server", "timestamp": "2021-12-02T12:27:18,455Z", "level": "TRACE", "component": "o.e.r.a.AzureBlobStore", "cluster.name": "eck", "node.name": "eck-es-master-1", "message": "reading container [device-logs], blob [snap-IjyNMCFzQrCRDtxKxkHlvQ.dat]", "cluster.uuid": "NYGS85kwQAO4m1cpEpjU1g", "node.id": "yHehmN5jSlK7_YWh2P3eOA"  }

This file has been truncated. show original

DavidTurner · December 2, 2021, 5:10pm

The snapshot started at 2021-12-02T14:41:01,809Z; you've only shared logs from one data node and they only start at 2021-12-02T14:46:05,544Z.

guptaparv-rlr · December 8, 2021, 1:26am

Hi @DavidTurner , here are the logs again. The file is huge 308mb and I was unable to create the gist. So have created a repo and uploaded the file there. Would appreciate it if you can look at the logs.

DavidTurner · December 8, 2021, 7:50am

It seems they are too big for Github too, at least it's using some large-file feature that I don't have installed (and won't be spending time installing either):

$ git clone git@github.com:guptaparv-rlr/snapshot.git
Cloning into 'snapshot'...
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 5 (delta 0), reused 5 (delta 0), pack-reused 0
Receiving objects: 100% (5/5), done.
$ ls -al snapshot
total 24
drwxr-xr-x   6 davidturner  staff  192  8 Dec 07:48 .
drwxr-xr-x   5 davidturner  staff  160  8 Dec 07:48 ..
drwxr-xr-x  12 davidturner  staff  384  8 Dec 07:48 .git
-rw-r--r--   1 davidturner  staff  143  8 Dec 07:48 .gitattributes
-rwxr-xr-x   1 davidturner  staff  134  8 Dec 07:48 data-1.json
-rwxr-xr-x   1 davidturner  staff  131  8 Dec 07:48 master-1.json
$ cat snapshot/data-1.json
version https://git-lfs.github.com/spec/v1
oid sha256:b103a5ac01171adfe54e8ad8a1a8d2af4576b60f2c458e0609ed21eef1b7df52
size 323753450

Could you just pick out the time range for the one specific snapshot and then gzip the files first?

guptaparv-rlr · December 9, 2021, 9:11am

I have zipped the files. You should be able to download and view the files now.

Thank you

DavidTurner · December 9, 2021, 9:57am

Looks like it's performing fine, but you've configured ES to split each file into chunks of 32 bytes?

writeBlob(indices/aGkkRKEdQ4qxP_n3An93tg/0/__dqJn2U4XQpSdpaLa4n8zAw.part1650, stream, 32) - done

Typical chunk sizes are measured in GBs, not bytes. Uploading a 1MB blob normally takes 50ms or so, but if you divide it into 30000 32-byte chunks then a few milliseconds of per-chunk overhead add up to minutes of wasted time.

guptaparv-rlr · December 9, 2021, 5:01pm

Thanks @DavidTurner. I'm not really sure where I saw the 32 bytes chunk size(maybe somewhere in the docs or github). But I have updated it to the default setting of 64Mb and now the snapshots are back to 1-3s.

Thank you so much for your help. Really grateful.

system · January 6, 2022, 5:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Master ES Node pegged at 100% CPU Utilization Elasticsearch	1	767	July 5, 2017
High CPU Usage in Elastic Elasticsearch	4	509	July 27, 2022
Strange High CPU usage related to licensing Elasticsearch	7	585	July 22, 2021
ES Server high CPU on CloseableThreadLocal Elasticsearch	4	1031	December 19, 2017
High CPU usage on only 1 Data node Elasticsearch	7	911	October 16, 2020

High CPU utilization on master node

Related topics