Events are lost when elasticsearch output is temporary unavailable

Hi all,

Newbie here.

I have kubernetes cluster in a location with patchy internet connection. I have metric beats on the cluster and I want to send metrics to ES which runs in cloud. I thought that when I put logstash between beats and ES, it will buffer events and retransmit them when ES is reachable again. However only metrics from the first 15 minutes are buffered (i.e. if connection drops at 10AM and restores at 11AM, only metrics between 10:15 and 11:00 are missing). Interestingly enough, the same happens when beats send metrics directly to ES. What can be the problem?

This is my logstash configuration:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: logstash-configmap
    data:
      logstash.yml: |
        http.host: "0.0.0.0"
        path.config: /usr/share/logstash/pipeline
      logstash.conf: |
        input {
          beats {
            port => 5044
          }
        }
        filter {
        }
        output {
          elasticsearch {
            index => "metrics-%{[@metadata][beat]}"
            hosts => [ "https://myesincloud.com:9200/" ]
            user => "elastic"
            password => "1234567890"
            ssl => true
            ssl_certificate_verification => false
            cacert => "/etc/logstash/certificates/ca.crt"
          }
        }

This is how the pod is deployed:

    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        app: logstash
      name: logstash
    spec:
      containers:
      - image: docker.elastic.co/logstash/logstash:7.12.0
        name: logstash
        ports:
        - containerPort: 25826
        - containerPort: 5044
        env:
        - name: ES_HOSTS
          value: "https://myesincloud.com:9200/"
        - name: ES_USER
          value: "elastic"
        - name: ES_PASSWORD
          value: "1234567890"
        resources: {}
        volumeMounts:
        - name: config-volume
          mountPath: /usr/share/logstash/config
        - name: logstash-pipeline-volume
          mountPath: /usr/share/logstash/pipeline
        - name: cert-ca
          mountPath: "/etc/logstash/certificates"
          readOnly: true
      restartPolicy: OnFailure
      volumes:
      - name: config-volume
        configMap:
          name: logstash-configmap
          items:
            - key: logstash.yml
              path: logstash.yml
      - name: logstash-pipeline-volume
        configMap:
          name: logstash-configmap
          items:
            - key: logstash.conf
              path: logstash.conf
      - name: cert-ca
        secret:
          secretName: myesincloud-es-http-certs-public

These are the only messages in logstash log

    [ERROR] 2021-05-05 09:51:45.370 [[main]>worker0] elasticsearch - Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>64}
    [WARN ] 2021-05-05 09:51:47.126 [Ruby-0-Thread-5: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.2-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:241] elasticsearch - Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"https://elastic:xxxxxx@myesincloud.com:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [https://elastic:xxxxxx@myesincloud.com:9200/][Manticore::ClientProtocolException] SSL peer shut down incorrectly"}
    . . .
    [WARN ] 2021-05-05 09:52:28.170 [Ruby-0-Thread-5: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.2-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:241] elasticsearch - Restored connection to ES instance {:url=>"https://elastic:xxxxxx@myesincloud.com:9200/"}

I appreciate your help.

Thank you.

Hi,

Welcome to this community :tada:
I guess that you are reaching the limit of concurrent events in the LogStash memory: Persistent queues (PQ) | Logstash Reference [8.11] | Elastic

By default, Logstash uses in-memory bounded queues between pipeline stages (inputs → pipeline workers) to buffer events. The size of these in-memory queues is fixed and not configurable

By configuring LogStash to use persistent queues like in the documentation LogStash will persist all incoming events in a file(depending on the amount of messages) you might want to reconfigure the maximum size.

Instead of using LogStash you could also use metricbeat to buffer the events (note that this feature is still in Beta): Configure the internal queue | Metricbeat Reference [8.11] | Elastic
This will spool all events to a file so metricbeat can retry to send the data as soon as Elasticsearch is available again.

Best regards
Wolfram

Hi Wolfram,

thank you for the tip. I configured persistent queue and it works as expected (although I noticed that buffered metrics take roughly 7 times more space on the disk than they take up when stored in elasticsearch :slight_smile: ).

I wasn't aware that metric beats have the option to persist data on disk - this seems to be even better solution than logstash.

Thank you!

Hi,

Good to know that the suggestions work.

I think that the storage overhead in LogStash is to be expected:

  1. Elasticsearch compresses data by default (see details here) while LogStash does not store the queue compressed (I think)
  2. LogStash stores metadata from each input/output/filter in each event. This contains for example the source ip for beats input. You can view the metadata for example by writting all events including metadata to a file (see details here: How to access the value in the logstash metadata - #2 by Christian_Dahlqvist).

Best regards
Wolfram

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.