Hi All,
Brief Fact:
We have deployed ECK in Azure AKS. The whole thing is behind an Ingress as seen in the below Diagram. The requirement is to connect elastic agents which are residing outside of the ECK cluster to fleet servers ( which are residing inside the cluster ). Agents can be from internal corporate network or can connect through Internet. Therefore an Ingress has been setup to load balance between fleet servers. In the Ingress we have configured three backend service :
- Elasticsearch - https://xxxx.mydomain.com:443/elasticsearch-eck
- Kibana - https://xxxx.mydomain.com:443/kibana-eck
- Fleet Server - https://xxxx.mydomain.com:443/fleetserver-eck
We have no problem in connecting Kibana and Elasticsearch through Ingress .
Issue Currently Being Faced:
The issue we are facing is when any elastic -agent which is outside the cluster is trying to connect to fleet-server through Ingress , The agent is getting successfully enrolled but it is turning unhealthy.
What we found out in local agent's log is, after the agent is enrolled in the fleet server ( Ingress URL - https://xxxx.mydomain.com:443/fleetserver-eck is used during enrollment ) the fleet server is actually returning back it's internal URL - [https://fleet-server-eck-agent-http.namespace.svc:8220/api/status? ] in response to the elastic agent . It is the fleet server's Kubernetes service URL which the external elastic agent has no means to resolve to.
The exact error is :
{"log.level":"error","@timestamp":"2022-08-26T09:30:13.406Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":211},"message":"failed to dispatch actions, error: fail to communicate with updated API client hosts: Get "[https://fleet-server-eck-agent-http.namespace.svc:8220/api/status?\](https://fleet-server-eck-agent-http.namespace.svc:8220/api/status?\)": lookup fleet-server-eck-agent-http.namespace.svc on 10.96.0.10:53: no such host","ecs.version":"1.6.0"}.
Different Options Tried
-
Added the Ingress URL in the Kibana config xpack.fleet.agents.fleet_server.hosts: along with the fleet server's service URL . i.e. :
- https://xxxx.mydomain.com:443/fleetserver-eck - https://fleet-server-eck-agent-http.namespace.svc:8220
-
Used --proxy-url and provided the ingress url https://xxxx.mydomain.com:443/fleetserver-eck during starting the elastic agent
None of the above options helped .
Note: When tried curl in https://xxxx.mydomain.com:443/fleetserver-eck/api/status , It is showing healthy status.
Elastic Agent Configuration
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: elastic-agent-ums
namespace: observability
spec:
version: 8.4.0
kibanaRef:
name: kibana-eck
fleetServerRef:
name: fleet-server-eck
mode: fleet
daemonSet:
podTemplate:
spec:
serviceAccountName: elastic-agent-serviceaccount
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
automountServiceAccountToken: true
securityContext:
runAsUser: 0
- Fleet Server Configuration
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: fleet-server-eck
namespace: observability
spec:
version: 8.4.0
kibanaRef:
name: kibana-eck
elasticsearchRefs:
- name: elasticsearch-eck
mode: fleet
fleetServerEnabled: true
deployment:
replicas: 2
podTemplate:
spec:
serviceAccountName: fleet-server-serviceaccount
automountServiceAccountToken: true
securityContext:
runAsUser: 0
- Kibana Config
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana-eck
namespace: observability
spec:
version: 8.4.0
count: 2
elasticsearchRef:
name: elasticsearch-eck
config:
xpack.fleet.agents.elasticsearch.hosts:
["https://elasticsearch-eck-es-http.observability.svc:9200"]
xpack.fleet.agents.fleet_server.hosts:
["https://fleet-server-eck-agent-http.observability.svc:8220"]
xpack.fleet.packages:
- name: system
version: latest
- name: elastic_agent
version: latest
- name: fleet_server
version: latest
- name: kubernetes
version: 0.14.0
- name: apm
version: latest
# pinning this version as the next one introduced a kube-proxy host setting default that breaks this recipe,
# see https://github.com/elastic/integrations/pull/1565 for more details
xpack.fleet.agentPolicies:
- name: Fleet Server on ECK policy
id: eck-fleet-server
namespace: observability
monitoring_enabled:
- logs
- metrics
is_default_fleet_server: true
package_policies:
- name: fleet_server-1
id: fleet_server-1
package:
name: fleet_server
- name: Elastic Agent on ECK policy
id: eck-agent
namespace: observability
monitoring_enabled:
- logs
- metrics
unenroll_timeout: 900
is_default: true
package_policies:
- name: system-1
id: system-1
package:
name: system
- name: kubernetes-1
id: kubernetes-1
package:
name: kubernetes
- name: apm-1
id: apm-1
package:
name: apm
inputs:
- type: apm
enabled: true
vars:
- name: host
value: 0.0.0.0:8200
We are stuck with this issue now for many days. Any help is much appreciated. We really need help on this. Please let us know if any additional configuration we need to do which is currently missing. Also whether what we are trying to achieve even it is supported now or not. Thanks