I followed the repository-gcs guide to install a snapshot repository for gcs a month ago. https://www.elastic.co/guide/en/elasticsearch/plugins/7.5/repository-gcs.html
POST http://elasticsearch-cluster:9200/_snapshot/daily
{"type": "gcs",
"settings": {"bucket": "my-bucket",
"base-path": "es-backup-static"}
}
Everything worked fine since beginning of June. We are creating a backup every 6 hours of our elasticsearch cluster with the corresponding timestamp as name.
PUT "http://elasticsearch-cluster:9200/_snapshot/daily/2020-07-18-12-00-00"
{ "indices": "index-ioc,index-file",
"include_global_state": false}
Since last week we are facing an issue which leads to high load on the gcs bucket. When enable the daily snapshot repository, the amount of requests/second are raising up to 500. After removing the repository the amount of requests is going down again.
Of course gcs does not allow such a high amount of requests and we get http 429 errors. Currently its not even possible anymore to enable the repository and to fetch the snapshots with:
GET http://elasticsearch-cluster:9200/_cat/snapshots/daily
Response:
{"error":{"root_cause":[{"type":"snapshot_exception","reason":"
[tmp:2020-06-28-00-01-03/Tkf58STgRNun5P_hAaIo0Q] Snapshot could not be read"}],"type":"snapshot_exception","reason":"[tmp:2020-06-28-00-01-03/Tkf58STgRNun5P_hAaIo0Q] Snapshot could not be read","caused_by":{"type":"storage_exception","reason":"429 Too Many Requests\nThe project exceeded the rate limit for CloudKMS decrypt operations.","caused_by":{"type":"google_json_response_exception","reason":"429 Too Many Requests\nThe project exceeded the rate limit for CloudKMS decrypt operations."}}},"status":500}
In the log we see the following messages then:
es-1 elasticsearch {"type": "server", "timestamp": "2020-07-20T13:35:27,511Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "es", "node.name": "es-1", "message": "path: /_snapshot/daily/_all, params: {repository=daily, snapshot=_all}", "cluster.uuid": "dGPxO0YdQDioz1rvpigPfg", "node.id": "nF6-bsDdRxqkcpYBnYvv8g" ,
es-1 elasticsearch "stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [es-0][10.22.20.83:9300][cluster:admin/snapshot/get]",
es-1 elasticsearch "Caused by: org.elasticsearch.snapshots.SnapshotException: [daily:2020-07-06-06-00-09/C3WiyagzQyqNqKQ_XF7dYQ] Snapshot could not be read",
es-1 elasticsearch "at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:216) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:136) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:56) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:98) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:169) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]",
es-1 elasticsearch "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]",
es-1 elasticsearch "at java.lang.Thread.run(Thread.java:830) [?:?]",
es-1 elasticsearch "Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: storage_exception: 429 Too Many Requests",
es-1 elasticsearch "The project exceeded the rate limit for CloudKMS decrypt operations.",
es-1 elasticsearch "at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:227) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.storage.spi.v1.HttpStorageRpc.read(HttpStorageRpc.java:690) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:127) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:124) ~[?:?]",
es-1 elasticsearch "at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.RetryHelper.run(RetryHelper.java:76) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.storage.BlobReadChannel.read(BlobReadChannel.java:123) ~[?:?]",
es-1 elasticsearch "at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore$1.lambda$read$0(GoogleCloudStorageBlobStore.java:189) ~[?:?]",
es-1 elasticsearch "at java.security.AccessController.doPrivileged(AccessController.java:554) ~[?:?]",
es-1 elasticsearch "at org.elasticsearch.repositories.gcs.SocketAccess.doPrivilegedIOException(SocketAccess.java:44) ~[?:?]",
es-1 elasticsearch "at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore$1.read(GoogleCloudStorageBlobStore.java:189) ~[?:?]",
es-1 elasticsearch "at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65) ~[?:?]",
es-1 elasticsearch "at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109) ~[?:?]",
es-1 elasticsearch "at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[?:?]",
es-1 elasticsearch "at java.io.InputStream.read(InputStream.java:213) ~[?:?]",
es-1 elasticsearch "at org.elasticsearch.common.io.Streams.doCopy(Streams.java:86) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.common.io.Streams.copy(Streams.java:79) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.common.io.Streams.copy(Streams.java:61) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.common.io.Streams.readFully(Streams.java:252) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:135) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.read(ChecksumBlobStoreFormat.java:114) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:754) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:208) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:136) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:56) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:98) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:169) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.1.jar:7.5.1]",
es-1 elasticsearch "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]",
es-1 elasticsearch "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]",
es-1 elasticsearch "at java.lang.Thread.run(Thread.java:830) ~[?:?]",
es-1 elasticsearch "Caused by: java.io.IOException: 429 Too Many Requests",
es-1 elasticsearch "The project exceeded the rate limit for CloudKMS decrypt operations.",
es-1 elasticsearch "at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150) ~[?:?]",
es-1 elasticsearch "at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) ~[?:?]",
es-1 elasticsearch "at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) ~[?:?]",
es-1 elasticsearch "at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:417) ~[?:?]",
es-1 elasticsearch "at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1089) ~[?:?]",
es-1 elasticsearch "at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:515) ~[?:?]",
es-1 elasticsearch "at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:448) ~[?:?]",
es-1 elasticsearch "at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeMedia(AbstractGoogleClientRequest.java:476) ~[?:?]",
es-1 elasticsearch "at com.google.api.services.storage.Storage$Objects$Get.executeMedia(Storage.java:6981) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.storage.spi.v1.HttpStorageRpc.read(HttpStorageRpc.java:668) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:127) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:124) ~[?:?]",
es-1 elasticsearch "at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.RetryHelper.run(RetryHelper.java:76) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50) ~[?:?]",
es-1 elasticsearch "at com.google.cloud.storage.BlobReadChannel.read(BlobReadChannel.java:123) ~[?:?]",
es-1 elasticsearch "at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore$1.lambda$read$0(GoogleCloudStorageBlobStore.java:189) ~[?:?]",
es-1 elasticsearch "at java.security.AccessController.doPrivileged(AccessController.java:554) ~[?:?]",
es-1 elasticsearch "at org.elasticsearch.repositories.gcs.SocketAccess.doPrivilegedIOException(SocketAccess.java:44) ~[?:?]",
es-1 elasticsearch "at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore$1.read(GoogleCloudStorageBlobStore.java:189) ~[?:?]",
es-1 elasticsearch "at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65) ~[?:?]",
es-1 elasticsearch "at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109) ~[?:?]",
es-1 elasticsearch "at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[?:?]",
es-1 elasticsearch "at java.io.InputStream.read(InputStream.java:213) ~[?:?]",
es-1 elasticsearch "at org.elasticsearch.common.io.Streams.doCopy(Streams.java:86) ~[elasticsearch-7.5.1.jar:7.5.1]",
...
Did anyone already had this issue? How can I fix these requests spiking always? I have no idea why it worked for the last month, but now it does not anymore, without changing anything.
Environment:
GKE 1.16
Google Cloud Storage
Elasticsearch v7.5.1
3 Node cluster