[ES 8.6.1]Elastic agent not deleting SQS message after processing

Hi

I am using Elastic agent integration to monitor ELB logs (via s3-sqs setup). I am able to see ELB logs in Elasticsearch cluster but my SQS inflight messages are keep on increasing. Looks like agent is not able to delete all the messages. I have given all relevant permissions to the SQS queue and the agent (Ref: Getting AWS logs from S3 using Filebeat and the Elastic Stack | Elastic Blog). Also on looking at agent logs I found below errors which may be cause of same.

{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"sync.(*Once).Do(...)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"sync/once.go:59","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close(0x55d9bdaaf7e0?)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/libbeat/publisher/pipeline/client.go:148 +0x59","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/filebeat/beater.(*countingClient).Close(0xc000d764b0?)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/filebeat/beater/channels.go:145 +0x22","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsS3EventProcessor).processS3Events(0xc0001b7340, {0x55d9bbb86f98, 0xc000984fc0}, 0xc0016059c0, {0xc000e91400?, 0x0?})","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3/sqs_s3_event.go:329 +0x811","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsS3EventProcessor).ProcessSQS(0xc0001b7340, {0x55d9bbb86f98, 0xc000984fc0}, 0xc00103e180)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3/sqs_s3_event.go:138 +0x333","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsReader).Receive.func1({0xc000cbb0e0, 0xc000d7e6a0, 0xc000d7e690, 0x0, 0x0, 0xc000d7e670, 0xc000d7e680, {}}, {0xc0f2a0b38f07cb57, 0x34689750ef, ...})","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3/sqs.go:91 +0x1bb","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"created by github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsReader).Receive","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:04.014Z","message":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3/sqs.go:83 +0x5a5","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"panic: close of closed channel","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"goroutine 185 [running]:","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*cache).stop(...)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata/cache.go:97","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).Close(0xc000140780?)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata/kubernetes.go:311 +0x4f","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/processors.Close(...)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/processors/processor.go:58","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x7f47458a7fff?)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/publisher/processing/processors.go:95 +0x159","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/processors.Close(...)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/processors/processor.go:58","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-13T10:05:07.560Z","message":"github.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x0?)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"ecs.version":"1.6.0"}

Not sure what these error are, plz help on this any guidance will be helpful.

TIA
Rubal

1 Like

Hi, I am also facing the same problem with CloudTrail, ELB, VPC Flow, and S3 Access logs in the same setup. Is it a know bug that will be resolved in v8.6.2?

Hello,
I am also facing the same issue as above with the same exact logs.
I'm running the agents in ECS Fargates, and connecting via a privatelink towards Elastic Cloud.

Have tried several versions and still facing this issue with 8.6.1, @rubal033 @tanwk2 may I ask are you guys running on the same config/version?

1 Like

Yeap, I am running the same config as you, and I also realized when going thru a privatelink, the metrics come in very slowly as well.

I am running agent on EC2 instances and version is 8.6.1. Also have raised the same on Elastic community slack. Looks like bug to me in version 8.x . Have you tried Elastic version 7.x ?

I have downgraded the Elastic-agent to 8.5.2 and the issue is resolved. Looks like issue is only with 8.6.x.

Thanks @rubal033 , v8.5.2 indeed is working properly, in fact, starting 8.5.3 already has this problem. Do you have any problem with the metrics that you are ingesting as well?

Error message:
[elastic_agent.metricbeat][error] elastic-agent-client got error: rpc error: code = PermissionDenied desc = invalid token

@tanwk2 as of now faced nothing with metrics, but I will check once more. Thanks for telling about version 8.5.3 as somehow I was not able to download and test it. I have raised the Github issue for the same and I will update the version details there.

Github issue link: [ES 8.5.3, 8.6.x]Elastic agent not deleting SQS message after processing. · Issue #2294 · elastic/elastic-agent · GitHub

Thanks all for the replies!

I just wanted to mention here that we are aware of the issue itself, and we are working on a fix. We will reference the above github issue once there is a resolution.

Not facing issue with just Metricbeat but trying to enroll a instance for the infra monitoring via elastic agent (v 8.5.2). The agent get enrolled for sometime and send stats but after 5 min it stop sending data and shows offline. On instance I can see agent process is running for metricbeat and filebeat but not able to send data and getting following logs:

{"log.level":"info","@timestamp":"2023-02-22T14:41:21.766Z","log.origin":{"file.name":"operation/operator.go","file.line":387},"message":"operation 'operation-install' skipped for metricbeat.8.5.2","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-02-22T14:41:21.959Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2023-02-22T14:41:21Z - message: Application: metricbeat--8.5.2--36643631373035623733363936343635[37b556dc-6e64-4388-8fb3-86002d9c98e4]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-02-22T14:41:21.963Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":66},"message":"Updating internal state","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-02-22T14:41:22.395Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2023-02-22T14:41:22Z - message: Application: metricbeat--8.5.2[37b556dc-6e64-4388-8fb3-86002d9c98e4]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-02-22T14:41:22.689Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2023-02-22T14:41:22Z - message: Application: filebeat--8.5.2--36643631373035623733363936343635[37b556dc-6e64-4388-8fb3-86002d9c98e4]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-02-22T14:41:23.167Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2023-02-22T14:41:23Z - message: Application: metricbeat--8.5.2--36643631373035623733363936343635[37b556dc-6e64-4388-8fb3-86002d9c98e4]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-02-22T14:42:23.913Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":290},"message":"Possible transient error during checkin with fleet-server, retrying","error":{"message":"could not decode the response, raw response: <html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body>\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n</body>\r\n</html>\r\n"},"request_duration_ns":60006213496,"failed_checkins":1,"retry_after_ns":72985654467,"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-02-22T14:44:37.134Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":290},"message":"Possible transient error during checkin with fleet-server, retrying","error":{"message":"could not decode the response, raw response: <html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body>\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n</body>\r\n</html>\r\n"},"request_duration_ns":60015061311,"failed_checkins":2,"retry_after_ns":219332905044,"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-22T14:49:16.723Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":296},"message":"Cannot checkin in with fleet-server, retrying","error":{"message":"could not decode the response, raw response: <html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body>\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n</body>\r\n</html>\r\n"},"request_duration_ns":60035794809,"failed_checkins":3,"retry_after_ns":299217125624,"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-22T14:55:16.173Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":296},"message":"Cannot checkin in with fleet-server, retrying","error":{"message":"could not decode the response, raw response: <html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body>\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n</body>\r\n</html>\r\n"},"request_duration_ns":60012500568,"failed_checkins":4,"retry_after_ns":390619352450,"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-22T15:02:47.027Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":296},"message":"Cannot checkin in with fleet-server, retrying","error":{"message":"could not decode the response, raw response: <html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body>\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n</body>\r\n</html>\r\n"},"request_duration_ns":60013601866,"failed_checkins":5,"retry_after_ns":665133605185,"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-22T15:20:09.755Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":296},"message":"Cannot checkin in with fleet-server, retrying","error":{"message":"status code: 400, fleet-server returned an error: BadRequest, message: apikey auth response bLOOeYYBRMxzU71ESmQ2: [401 Unauthorized] {\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"unable to authenticate with provided credentials and anonymous access is not allowed for this request\",\"additional_unsuccessful_credentials\":\"API key: api key [bLOOeYYBRMxzU71ESmQ2] has been invalidated\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}}],\"type\":\"security_exception\",\"reason\":\"unable to authenticate with provided credentials and anonymous access is not allowed for this request\",\"additional_unsuccessful_credentials\":\"API key: api key [bLOOeYYBRMxzU71ESmQ2] has been invalidated\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}},\"status\":401}"},"request_duration_ns":16316340,"failed_checkins":7,"retry_after_ns":764925088950,"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-22T15:32:54.920Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":296},"message":"Cannot checkin in with fleet-server, retrying","error":{"message":"status code: 400, fleet-server returned an error: BadRequest, message: apikey auth response bLOOeYYBRMxzU71ESmQ2: [401 Unauthorized] {\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"unable to authenticate with provided credentials and anonymous access is not allowed for this request\",\"additional_unsuccessful_credentials\":\"API key: api key [bLOOeYYBRMxzU71ESmQ2] has been invalidated\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}}],\"type\":\"security_exception\",\"reason\":\"unable to authenticate with provided credentials and anonymous access is not allowed for this request\",\"additional_unsuccessful_credentials\":\"API key: api key [bLOOeYYBRMxzU71ESmQ2] has been invalidated\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}},\"status\":401}"},"request_duration_ns":18829476,"failed_checkins":8,"retry_after_ns":804145282790,"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-22T15:46:19.302Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":296},"message":"Cannot checkin in with fleet-server, retrying","error":{"message":"status code: 400, fleet-server returned an error: BadRequest, message: apikey auth response bLOOeYYBRMxzU71ESmQ2: [401 Unauthorized] {\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"unable to authenticate with provided credentials and anonymous access is not allowed for this request\",\"additional_unsuccessful_credentials\":\"API key: api key [bLOOeYYBRMxzU71ESmQ2] has been invalidated\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}}],\"type\":\"security_exception\",\"reason\":\"unable to authenticate with provided credentials and anonymous access is not allowed for this request\",\"additional_unsuccessful_credentials\":\"API key: api key [bLOOeYYBRMxzU71ESmQ2] has been invalidated\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}},\"status\":401}"},"request_duration_ns":16397215,"failed_checkins":9,"retry_after_ns":537992006263,"ecs.version":"1.6.0"}

Just want to confirm if you faced same and able to resolve?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.