Hello!
We have a Elasticsearch cluster running in production. We are using github.com/olivere/elastic(V7) to create snapshots.
Here is how it goes,
-
Create a
*elastic.Client
, Verify authentication works by fetching the ES version. -
Trigger a snapshot creation in async mode. Then, Every 1 minute we check the status of snapshot creation.
Here is what the logs looks like,
time="2020-05-17T03:36:12Z" level=warning msg="error in fetching snapshot state. Try 1 of 5: elastic: Error 503 (Service Unavailable)" repository=esbackup-mw-elk-prod snapshot=20200516032609
time="2020-05-18T03:32:12Z" level=warning msg="error in fetching snapshot state. Try 2 of 5: elastic: Error 429 (Too Many Requests): [parent] Data too large, data for [<http_request>] would be [8162093384/7.6gb], which is larger than the limit of [8127315968/7.5gb], real usage: [8162093384/7.6gb], new bytes reserved: [0/0b] [type=circuit_breaking_exception]" repository=esbackup-mw-elk-prod snapshot=20200516032609
time="2020-05-18T03:40:12Z" level=warning msg="error in fetching snapshot state. Try 3 of 5: elastic: Error 503 (Service Unavailable)" repository=esbackup-mw-elk-prod snapshot=20200516032609
time="2020-05-20T10:15:09Z" level=warning msg="error in fetching snapshot state. Try 4 of 5: elastic: Error 404 (Not Found): [esbackup-mw-elk-prod:20200516032609] is missing [type=snapshot_missing_exception]" repository=esbackup-mw-elk-prod snapshot=20200516032609
Upon googling, I learned, circuit_breaking_exception can occur if there is less memory than needed to complete a operation[1].
What I don't understand is, Why it starts returning "Not Found" after a while and basically never recovers.
If I send a curl request to get snapshot state, It reports just fine. If i use the elastic wrapper, It keeps returning Not Found errors.