Unable to deploy Fleet Server using ECK Operator in OpenShift

Hello,
Currently we are trying to deploy the Elastic stack in a OpenShift 4.16 cluster.
So far we have managed to deploy Elasticsearch, Kibana and Logstash following documentation but when trying to deploy Fleet Server it will crashloop with error:
Error: request to get security token from Kibana failed: fail to execute the HTTP POST request: Post "http://kibana:5601/api/fleet/service_tokens": lookup kibana on 172.30.0.10:53: server misbehaving

What we fail to understand is from where http://kibana:5601 is coming from, as everything is managed through the ECK operator and the kibana service is kibana-kb-http.

As reference I am adding ES, Kibana and Agent (Fleet Server) yaml

Elasticsearch:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
  namespace: secelk
  labels:
    env: elasticsearch
spec:
  auth: {}
  http:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate:
        secretName: elastic-secelk-tls
  monitoring:
    logs: {}
    metrics: {}
  nodeSets:
    - config:
        node.store.allow_mmap: false
        xpack.security.authc.realms:
          saml:
            saml1:
              attributes.principal: nameid
              idp.entity_id: 'Removed'
              idp.metadata.path: /usr/share/elasticsearch/config/saml/idp-saml-metadata.xml
              order: 2
              sp.acs: 'Removed'
              sp.entity_id: Removed
              sp.logout: 'Removed'
      count: 3
      name: default
      podTemplate:
        metadata:
          creationTimestamp: null
        spec:
          containers:
            - name: elasticsearch
              resources:
                limits:
                  cpu: '2'
                  memory: 4Gi
                requests:
                  cpu: '1'
                  memory: 4Gi
              volumeMounts:
                - mountPath: /usr/share/elasticsearch/config/saml
                  name: idp-saml-metadata
          volumes:
            - name: idp-saml-metadata
              secret:
                secretName: idp-saml-metadata
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
  remoteClusterServer: {}
  transport:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate: {}
      certificateAuthorities: {}
  updateStrategy:
    changeBudget: {}
  version: 8.17.3

Kibana

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
  namespace: secelk
spec:
  config:
    elasticsearch.ssl.certificateAuthorities: /etc/certs/tls.crt
    server.publicBaseUrl: 'Removed'
    xpack.fleet.agentPolicies:
      - id: eck-fleet-server
        is_managed: true
        monitoring_enabled:
          - logs
          - metrics
        name: Fleet Server on ECK policy
        namespace: secelk
        package_policies:
          - id: fleet_server-1
            name: fleet_server-1
            package:
              name: fleet_server
        unenroll_timeout: 900
      - id: eck-agent
        is_default: true
        is_managed: true
        monitoring_enabled:
          - logs
          - metrics
        name: Elastic Agent on ECK policy
        namespace: secelk
        package_policies:
          - id: system-1
            name: system-1
            package:
              name: system
        unenroll_timeout: 900
    xpack.fleet.agents.fleet_server.hosts:
      - 'https://fleet-server-agent-http.secelk.svc:8220'
    xpack.fleet.outputs:
      - hosts:
          - 'https://elasticsearch-es-http.secelk.svc.cluster.local:9200'
        id: eck-fleet-agent-output-elasticsearch
        is_default: true
        name: eck-elasticsearch
        ssl:
          certificate_authorities:
            - /mnt/elastic-internal/elasticsearch-association/secelk/elasticsearch/certs/ca.crt
        type: elasticsearch
    xpack.fleet.packages:
      - name: system
        version: latest
      - name: elastic_agent
        version: latest
      - name: fleet_server
        version: latest
    xpack.fleet.registryProxyUrl: 'Removed'
    xpack.security.authc.providers:
      basic.basic1:
        order: 1
      saml.saml1:
        order: 0
        realm: saml1
  count: 1
  elasticsearchRef:
    name: elasticsearch
  enterpriseSearchRef: {}
  http:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate:
        secretName: kibana-tls
  monitoring:
    logs: {}
    metrics: {}
  podTemplate:
    metadata:
      creationTimestamp: null
    spec:
      containers:
        - env:
            - name: NODE_EXTRA_CA_CERTS
              value: /etc/certs/tls.crt
          name: kibana
          resources:
            limits:
              cpu: '1'
              memory: 1Gi
          volumeMounts:
            - mountPath: /etc/certs
              name: elasticsearch-certs
              readOnly: true
      volumes:
        - name: elasticsearch-certs
          secret:
            secretName: elastic-secelk-tls
  version: 8.17.3

Fleet Server

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: fleet-server
  namespace: secelk
spec:
  deployment:
    podTemplate:
      metadata:
        creationTimestamp: null
      spec:
        automountServiceAccountToken: true
        securityContext:
          fsGroup: 1000
        serviceAccountName: fleet-server
        volumes:
          - emptyDir: {}
            name: agent-data
    strategy: {}
  fleetServerEnabled: true
  fleetServerRef: {}
  http:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate: {}
  kibanaRef:
    name: kibana
    namespace: secelk
  mode: fleet
  policyID: eck-fleet-server
  version: 8.17.3

As an additional note we have tried to deploy the Fleet Server both as non root without persistence and with root permissions, neither worked due to the error mentioned at the start.

Thanks to anyone that can shine some light on why its not working.

Could you please try to set xpack.fleet.agents.elasticsearch.hosts, refer to this recipe for a complete example.

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
spec:
  version: 8.17.0
  count: 1
  elasticsearchRef:
    name: elasticsearch
  config:
    xpack.fleet.agents.elasticsearch.hosts: ["https://elasticsearch-es-http.default.svc:9200"]
    xpack.fleet.agents.fleet_server.hosts: ["https://fleet-server-agent-http.default.svc:8220"]
    xpack.fleet.packages: [...]

Hi @michael.morello.

First of all thanks you for your answer.
We have tried that config that you suggested (for deploying Agent with root permissions) and then for deploying it without root permissions as per Elastic documentation:
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-elastic-agent-fleet-configuration.html

Current kibana config with elasticseach.host set:

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
  namespace: secelk
spec:
  config:
    elasticsearch.ssl.certificateAuthorities: /etc/certs/tls.crt
    server.publicBaseUrl: 'Removed'
    xpack.fleet.agentPolicies:
      - id: eck-fleet-server
        is_managed: true
        monitoring_enabled:
          - logs
          - metrics
        name: Fleet Server on ECK policy
        namespace: secelk
        package_policies:
          - id: fleet_server-1
            name: fleet_server-1
            package:
              name: fleet_server
        unenroll_timeout: 900
      - id: eck-agent
        is_default: true
        is_managed: true
        monitoring_enabled:
          - logs
          - metrics
        name: Elastic Agent on ECK policy
        namespace: secelk
        package_policies:
          - id: system-1
            name: system-1
            package:
              name: system
        unenroll_timeout: 900
    xpack.fleet.agents.elasticsearch.hosts:
      - 'https://elasticsearch-es-http.secelk.svc:9200'
    xpack.fleet.agents.fleet_server.hosts:
      - 'https://fleet-server-agent-http.secelk.svc:8220'
    xpack.fleet.packages:
      - name: system
        version: latest
      - name: elastic_agent
        version: latest
      - name: fleet_server
        version: latest
    xpack.fleet.registryProxyUrl: 'Removed'
    xpack.security.authc.providers:
      basic.basic1:
        order: 1
      saml.saml1:
        order: 0
        realm: saml1
  count: 1
  elasticsearchRef:
    name: elasticsearch
  enterpriseSearchRef: {}
  http:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate:
        secretName: kibana-tls
  monitoring:
    logs: {}
    metrics: {}
  podTemplate:
    metadata:
      creationTimestamp: null
    spec:
      containers:
        - env:
            - name: NODE_EXTRA_CA_CERTS
              value: /etc/certs/tls.crt
          name: kibana
          resources:
            limits:
              cpu: '1'
              memory: 1Gi
          volumeMounts:
            - mountPath: /etc/certs
              name: elasticsearch-certs
              readOnly: true
      volumes:
        - name: elasticsearch-certs
          secret:
            secretName: elastic-secelk-tls
  version: 8.17.3

For both cases the error is still the same:

Error: request to get security token from Kibana failed: fail to execute the HTTP POST request: Post "http://kibana:5601/api/fleet/service_tokens": lookup kibana on 172.30.0.10:53: server misbehaving

elasticsearchRefs is missing in your Agent manifest:

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: fleet-server
  namespace: secelk
spec:
[...]
  fleetServerEnabled: true
  elasticsearchRefs: ## Here
    - name: elasticsearch
      namespace: secelk
[...]