Issue when enabling TLS in Fluent-bit with Elasticsearch as output

I am setting a EFK stack in a Kubernetes single host cluster. Everything works perfectly with http. However, when I enable TLS in Fluent-bit ConfigMap, I experience the following error:

[2024/07/04 16:51:57] [error] [tls] error: unexpected EOF
[2024/07/04 16:51:57] [ warn] [engine] failed to flush chunk '1-1720111186.704158600.flb', retry in 84 seconds: task_id=47, input=tail.0 > output=es.0 (out_id=0)
[2024/07/04 16:51:58] [error] [tls] error: unexpected EOF
[2024/07/04 16:51:58] [error] [tls] error: unexpected EOF
[2024/07/04 16:51:58] [ warn] [engine] failed to flush chunk '1-1720111346.704169721.flb', retry in 612 seconds: task_id=208, input=tail.0 > output=es.0 (out_id=0)
[2024/07/04 16:51:58] [ warn] [engine] failed to flush chunk '1-1720111588.707272575.flb', retry in 68 seconds: task_id=450, input=tail.0 > output=es.0 (out_id=0)

Note that TLS works perfectly between Kibana and Elasticsearch, so their configuration seems to be fine.

Now, I have the following config for Fluent-bit for the output plugin:

output-elasticsearch.conf: |
    [OUTPUT]
        Name              es
        Match             *
        Host              ${FLUENT_ELASTICSEARCH_HOST}
        Port              ${FLUENT_ELASTICSEARCH_PORT}
        Logstash_Format   Off
        Replace_Dots      On
        Retry_Limit       False
        Suppress_Type_Name On
        tls             On
        tls.verify      Off
        tls.debug       3
        tls.ca_file     /fluent-bit/tls/tls.crt
        tls.crt_file    /fluent-bit/tls/tls.crt
        tls.key_file    /fluent-bit/tls/tls.key    
        HTTP_User       ${FLUENT_ELASTICSEARCH_USER}
        HTTP_Passwd     ${FLUENT_ELASTICSEARCH_PASSWORD}

This, in order to mount the certificates, in my daemonSet.yaml:

spec:
  containers:
    - name: fluent-bit
      image: fluent/fluent-bit:3.0.7 # Latest
      imagePullPolicy: Always
      ports:
        - containerPort: 2020
      env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        - name: FLUENT_ELASTICSEARCH_USER # Una vez funcione, cargar con secreto
          value: "kibana_system" # I am taking advantage of kibana's user
        - name: FLUENT_ELASTICSEARCH_PASSWORD
          value: "herethepassword"
      volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        - name: tls-certs
          mountPath: /fluent-bit/tls
  terminationGracePeriodSeconds: 10
  volumes:
    - name: varlog
      hostPath:
        path: /var/log
    - name: varlibdockercontainers
      hostPath:
        path: /var/lib/docker/containers
    - name: fluent-bit-config
      configMap:
        name: fluent-bit-config
    - name: tls-certs
      secret:
        secretName: elasticsearch-certs

The elasticsearch-certs is the same that I am currently using with elasticsearch, which is working for the Kibana-elastic communication.

I have found someone with the same issue, not really sure if it is a bug or something is missconfigured: [error] [tls] error: unexpected EOF in versions after 2.0.6 · Issue #8452 · fluent/fluent-bit · GitHub

Also, as said, Kibana works, but is doing the request via URL, not really sure if this means something:

spec:
  containers:
    - name: kibana
      image: docker.elastic.co/kibana/kibana:8.14.1 # Ojo. Same, exactly version than elastic search!!
      resources:
        limits:
          cpu: 1000m
          memory: 1Gi
        requests:
          cpu: 700m
          memory: 1Gi
      env:
        - name: ELASTICSEARCH_URL
          value: https://elastic-search.basajaun-cluster:30738
          # SERVICE_TOKEN elastic/kibana/kibana-system = AAEAAWVsYXN0aWMva2liYW5hL2tpYmFuYS1zeXN0ZW06VnNGcmF0TlJUY1dGYkRla01OekVvZw
        - name: ELASTICSEARCH_USERNAME
          value: "kibana_system"
        - name: ELASTICSEARCH_PASSWORD
          value: "thepassword"
        - name: ELASTICSEARCH_SSL_VERIFICATIONMODE
          value: "none" # Solo en entornos de desarrollo, para producción usar "full"
      ports:
        - containerPort: 5601

Any help is appreciated.

There are not a log of fluent-bit experts here (we recommend using elastic agent/beats for log shipping), so I'm going to have to guess about what some of that config means.

I can't find any useful documentation on what tls.verify actually controls. The docs say "certificate verification", which I would assume means all verification is off (trust chain, certificate properties, hostname) but it's not explicit.

Your Kibana server has verification-mode of none which disables all verification. So if fluent-bit's config doesn't mean the same thing that could explain why you have an issue.

Where does this tls.crt file come from?

Is that the http_ca.crt from Elasticsearch, or something else?

Hello and thank you for the answer!

The tls.crt and the CA were the same as it was self signed for quick testing purposes.

I have found the issue. In my cluster, I use the Kubernetes Gateway API (with an NGINX controller that implements it), which has TLS termination. When traffic is received, it is decrypted and routed at the HTTP layer. Kibana gets the users and roles from Elasticsearch, which uses xpack for authentication and other security features. Therefore, the transport layer needs to be secured with TLS. However, since it's within the cluster, HTTP layer security can be disabled, which aligns with my cluster's internal requirements.

The problem was that if Fluent Bit needs to send logs to Elasticsearch via TLS, Elasticsearch must have HTTP security set to true. I didn't have this part configured, so Fluent Bit was giving the error I mentioned. It's possible that Kibana may now give errors if it's not properly set up for such connections, so it's important to ensure that requests are made at the inter-cluster service level, instead of through an external URL, as the reverse proxy with transform HTTPS into HTTP.

Therefore, at least in my case, since I have everything under TLS termination upon entering into the cluster through the Gateway API, I can just safely route all internal traffic via HTTP.