Autodiscovering logstash nodes in Kubernetes

So, I have a Logstash Statefulset running in Kubernetes. I have a headless service defined to establish network connection to the logstash pods.

I deployed beats as a deployment using the official docker image, and I was able to establish communication to Logstash from FileBeat. The issue I face is currently I have to hardcode the logstash service names in my config for FileBeat to recognize the multiple pods running in my Statefulset.

    output.logstash:
      # The Logstash hosts
      hosts: ["logstash-0.logs:5044", "logstash-1.logs:5044", "logstash-2.logs:5044"]
      loadbalance: true
      compression_level: 3
      bulk_max_size: 8192

In a scenario where I would like to configure an HPA, I will need to change the configuration and redeploy filebeat. Is there a way where I can specify the headless service alone and beats will use all the underlying pods to communicate with Logstash?

I have already tried doing the following:

    output.logstash:
      # The Logstash hosts
      hosts: ["logs:5044"]
      loadbalance: true
      compression_level: 3
      bulk_max_size: 8192

But this is quite erratic and once Filebeat initiates a connection with 1 or 2 pods behind the service it never checks for additional pods. I know, this is not really an issue with FileBeat, but is there any suggestion on how I can make FileBeat aware of the new Logstash pods that were provisioned?

How does FileBeat handle new ES nodes provisioned by the ECK operator? Maybe that can give some clue?

Thanks!

Hey @NerdSec :slight_smile:

Hopefully we can figure out a good solution for this, just wanted to start with checking why you might not use ECK to deploy Logstash and filebeat as well? Or maybe its helm charts?

When deploying the logstash pods, the easiest might be to just pass the list of new pods to filebeat and automatically redeploy it, though I do see that it might become a problem if they resize often.

The other way, with having a single load balancer IP, as in your second example, how did you configure this? If filebeat only relates to a single shared address, you can disable the loadbalance option in the logstash output.
In terms of having more connections initiated, so that it might spread out, can you try to set a worker count to something higher than the default?
For example under bulk_max_size, you can add "worker: 3".

Maybe we can find a solution if you would be able to share the yml for the statefulset + the yml for the loadbalancer service?

Hi @Marius_Iversen :slight_smile:

So, I initially started off with ECK, but it had two issues for me:

  1. Logstash is not part of ECK yet.
  2. FileBeat cannot be deployed with ECK with our PSP.

FileBeat when deployed by ECK needs a unique path for the data directory. (Totally logical.) But the issue is, the only unique path it will accept is a hostPath. By default hostPath are not allowed on our cluster. Ideally I would like to provision a PV for FileBeat (Limit the deployment to scale to 1 only) or deploy it as a statefulset, but the only two options supported via ECK is a deamonset or deployment.

I need Logstash to talk to Kafka, which is our messaging queue. Ideally, I would like FileBeat to send data to Kafka directly, but FileBeat does not support SCRAM-SHA2 authentication. Hence the requirement for two Logstash instances. Now, this leads to a resource issue, as I am using two pods to do a pub-sub with Kafka. Hence, my thoughts drifted to an HPA.

So, this is again a tricky piece. I cannot reply on Kubernetes to handle my service for Logstash. If I do so, it will take care of load balancing and beats ends up talking to only one Logstash pod. So, I exposed it as a Headless service, with this, Beats can handle loadbalancing. But the problem is the HPA has stable network identities for individual pods. If I simply use the Headless service name without the hostname of the pod preceding to it, Beats, does a TCP negotiation and latches on to 1-2 pods in the cluster. So, even if new pods are added later on, I will have to restart Beats for the TCP connections to be initiated?

apiVersion: v1
kind: Service
metadata:
  name: ad
  labels:
    app: logstash-ad
spec:
  ports:
  - port: 5044
    name: logs
  clusterIP: None
  selector:
    app: logstash-ad
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: logstash-ad
  name: ad
spec:
  podManagementPolicy: OrderedReady
  replicas: 5
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: logstash-ad
  serviceName: ad
  template:
    metadata:
      labels:
        app: logstash-ad
    spec:
      containers:
      - env:
        - name: LS_NODE_NAME
          value: K8S_ad
        image: logstash:7.12
        imagePullPolicy: Always
        name: ad
        resources:
          requests:
            memory: "1024Mi"
          limits:
            memory: "2048Mi"
            cpu: "2000m"
        securityContext:
          allowPrivilegeEscalation: false
          capabilities: {}
          privileged: false
          readOnlyRootFilesystem: false
          runAsNonRoot: true
          runAsGroup: 1000
          runAsUser: 1000
        stdin: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        tty: true
        volumeMounts:
        - mountPath: /usr/share/logstash/config/pipelines.yml
          name: pipelines
          subPath: pipelines.yml
        - mountPath: /usr/share/logstash/config/logstash.yml
          name: config
          subPath: logstash.yml
        - mountPath: /usr/share/logstash/config/jvm.options
          name: jvm
          subPath: jvm.options
        - mountPath: /usr/share/logstash/data
          name: data
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: secret
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1000
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 256
          items:
          - key: ad
            path: pipelines.yml
          name: ad
          optional: false
        name: pipelines
      - configMap:
          defaultMode: 256
          items:
          - key: logstash.yml
            path: logstash.yml
          name: config
          optional: false
        name: config
      - configMap:
          defaultMode: 256
          items:
          - key: jvm.small
            path: jvm.options
          name: jvm
          optional: false
        name: jvm
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: data
      labels:
        logstash: ad
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 2Gi
      storageClassName: longhorn

Filebeat

apiVersion: apps/v1
kind: Deployment
metadata:
  name: filebeat-ad
  labels:
    app: filebeat-ad
spec:
  replicas: 1
  selector:
    matchLabels:
      app: filebeat-ad
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: filebeat-ad
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: ad
        args:
        - -e
        - -c
        - /etc/filebeat.yml
        image: docker.elastic.co/beats/filebeat:7.10.2
        env:
        - name: FILEPATH
          value: '/opt/elk/filebeat/ad/*.log'
        - name: NAME
          value: 'ad'
        - name: LOGSTASH
          value: "ad-0.beats:5044,ad-1.beats:5044,ad-2.beats:5044"
        resources:
          limits:
            memory: "2048Mi"
            cpu: "2000m"
        securityContext:
#          runAsGroup: 1000
          runAsUser: 1000
        volumeMounts:
        - name: filebeat
          subPath: filebeat.yml
          mountPath: /etc/filebeat.yml
        - name: logs
          mountPath: /opt/elk
        - name: data
          mountPath: /usr/share/filebeat/data
      securityContext:
        fsGroup: 1000
#        runAsGroup: 1000
        runAsUser: 1000
      volumes:
      - name: filebeat
        configMap:
          defaultMode: 256
          items:
          - key: config
            path: filebeat.yml
          name: filebeat
      - name: data
        nfs:
          path: /ifs/NFS/ad/
          server: nfs.domain.com
      - name: logs
        nfs:
          path: /ifs/NFS/ELK
          server: nfs.domain.com

PS: The data volume could have been a longhorn volume too. But the logs volume will be an NFS as the logs are present in a remote share. Yes, I know FB is not recommended to read NFS logs. :smiley:

Sorry for the long post!

Configmap for Filebeat:

kind: ConfigMap
metadata:
  name: filebeat
data:
  config: |-
    filebeat.inputs:
    - type: log
      paths: ["${FILEPATH}"]
      tags: ["${NAME}"]
    #================================ General =====================================
    name: "K8S - ${NAME}"
    #================================ Outputs =====================================
    # Configure what output to use when sending the data collected by the beat.
    #----------------------------- Logstash output --------------------------------
    output.logstash:
      # The Logstash hosts
      hosts: "${LOGSTASH}"
      loadbalance: true
      compression_level: 3
      bulk_max_size: 8192
    queue.mem:
      events: 65536
    processors:
    - add_host_metadata: {}

I did not specify workers, cause I found out that if I increase the resource limits on the FileBeat container, it increases the throughput accordingly. I guess it scales similar to how it would if it was installed on the machine. So, if I limit it to 2000m CPU it will by default use 2 workers. I may be way off, but the workers setting is pointless if there are resource limits defined in my deployment.

The logstash image has the relevant pipelines baked into it, governed by a CI-CD pipeline.

Quite a lot to go through, so let me just ask another question in the meantime, we added the SCRAM-SHA2 support back in 2020 for 7.11:
issue: Complete support for SCRAM-SHA-256/512 in Kafka output · Issue #16723 · elastic/beats · GitHub
PR: https://github.com/elastic/beats/pull/12867

Would this resolve some of these issues if Logstash was not needed?
Seeing that you ideally wanted the direct connection, and could maybe use ECK for most of it, not to try to dodge the question :slight_smile:

I never mind the architecture choice of Beat - Logstash - ES if there is a need, but in general it causes more bottlenecks and more pieces to debug, especially on things like bulks and how the ACK works, filebeat sends to logstash, logstash sends to ES, ES acks to Logstash, Logstash acks to filebeat (from my understanding, though not 100% sure).

For the hostPath part I want to dig into that one a bit more as well, not to dodge the question, but I rather want to provide a long-term answer to your question, rather than trying to fix something because of limitations that might not exist anymore, or has a better workaround.

Yes, if this is available in FileBeat, then yes, this would solve a large part of the problem. Will test this and get back to you! Thanks! :slight_smile: A follow up question, any plans on adding support for FileBeat to delete the files after reading?

I completely agree, and hence the need to add Kafka to act as a buffer and implementing a lambda for sharing this data across teams.

:slight_smile: Thank you for taking the time to do so.

Hi @NerdSec , sorry for the delay, to follow up on your questions.

Removing the file after reading is not something we are currently looking at I believe, it has come up in the past, but we do feel there are other ways to do this better, as this can often cause so many niche issues, so its better with some stricter log rotations for example.

For the hostPath, you are indeed correct at this point, as we don't have support for stateful sets for beats, though I did get a comment back saying

" You can work around the hostPath issue by configuring an emptyDir volume instead. It has the disadvantage that Beats will lose state and start ingesting from the beginning if the Pods gets recreated."

So it would be a bit of a choice, depending on what you need at this point.

1 Like

Thank you for the reply @Marius_Iversen. I ended up extending the python code to read the registry file, to something that can read the file, send it to Kafka and then delete it.

I initially did use Beats as a deployment i created manually, but then it still does not have support for SASL SSL as of now, i guess. Maybe, we achieve this a few releases down the line, or maybe it comes as a unified agent! :crossed_fingers:

Cheers! :beers:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.