Elastic agents reporting error: status code: 429, fleet-server returned an error: MaxLimit

Hi,

Lately we've been receving a lot of Could not communicate with fleet-server Checking API will retry, error: status code: 429, fleet-server returned an error: MaxLimit, message: exceeded the max limit messages from our agents. When looking at similar forums such as these for solutions:

https://githubmemory.com/repo/elastic/fleet-server/issues/571

We have around 100 agents and configured our fleet policy to match the recommended values for 5000 agents divided by 2, to be on the safe side for future enrollments and to see whether our issue get resolved.

Unfortunately, this has not been the case. We've set the Max connections for the Elastic Cloud agent policy to 1000. This is our custom fleet server configuration yaml:

cache:
  num_counters: 10000        # Limit the size of the hash table to rougly 10x expected number of elements
  max_cost: 10485760         # Limit the total size of data allowed in the cache, 2 MiB in bytes.
server.limits:
   policy_throttle: 100ms  # Roll out a new policy every 200ms; roughly 5 per second.
   checkin_limit:
     interval: 10ms        # Check in no faster than 20 per second.
     burst: 250             # Allow burst up to 25, then fall back to interval rate.
     max: 2601              # No more than 100 long polls allowed. THIS EFFECTIVELY LIMITS MAX ENDPOINTS.
   artifact_limit:
     interval: 10ms       # Roll out 10 artifacts per second
     burst: 250             # Small burst prevents outbound buffer explosion.
     max: 500               # Only 10 transactions at a time max.  This should generally not be a relavent limitation as the transactions are cached.
   ack_limit:
     interval: 8ms        # Allow ACK only 100 per second.  ACK payload is unbounded in RAM so need to limit.
     burst: 250             # Allow burst up to 20, then fall back to interrval rate.
     max: 500               # Cannot have too many processing at once due to unbounded payload size.
   enroll_limit:
     interval: 40ms       # Enroll is both CPU and RAM intensive.  Limit to 10 per second.
     burst: 25              # Allow intial burst, but limit to max.
     max: 50               # Max limit.
server.runtime:
  gc_percent: 20          # Force the GC to execute more frequently: see https://golang.org/pkg/runtime/debug/#SetGCPercent

We'd be grateful for any suggestions to resolve this issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.