Persistent Queue problems in k8s on AWS

We're currently looking into migrating our logstash cluster from ec2 instances to kubernetes. We're seeing some odd behavior in the persistent queue compared to our legacy implementation. Our legacy logstash cluster uses a persistent disk queue writing to standard EBS volumes. It's running logstash v5.6.9 with the following settings:

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.local.only=true
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.ssl=false
-Dfile.encoding=UTF-8
-Djava.awt.headless=true
-XX:+DisableExplicitGC
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=75
-Xms8g
-Xmx8g

  pipeline.workers: 16
  path.queue: /data/logstash/queue
  path.data: /var/lib/logstash
  path.config: /etc/logstash/conf.d
  pipeline.batch.delay: 5
  pipeline.batch.size: 125
  queue.type: persisted
  config.reload.automatic: true
  path.logs: /var/log/logstash
  config.reload.interval: 3
  queue.checkpoint.writes: 1
  queue.max_bytes: "96gb"

This works fine and creates a disk queue as needed to handle bursts of traffic. Since it's logstash < 6 the page files are 250mb. Even on a standard EBS volume, which is backed by magnetic storage, it's never io-bound.

Our new k8s cluster is configured to use gp2 EBS volumes which are backed by SSDs. Although we had it scaled up previously, while we try to troubleshoot this we have just a single node with a single logstash 6.4.2 pod running on it. It currently appears to be io-bound constantly, to the point that the processing queue behind it processes as fast as it can write to the disk, but it never writes more than a single page file.

Generating a large volume of traffic for testing leads to backups in TCP buffers, seen in a large number of CLOSE_WAIT sockets open. In extreme cases we saw logstash process traffic for nearly 24 hours after we stopped sending it, despite (I thought) linux TCP sockets having a timeout of 1 hour. There was never more than a single page file written at one time. Here are the configurations we're using:

LOGSTASH_PORT_5140_TCP_PROTO=tcp
PATH_CONFIG=/usr/share/logstash/pipeline
QUEUE_MAX_BYTES=98gb
LOGSTASH_BEATS_PORT_5140_TCP_PROTO=tcp
LOGSTASH_BEATS_PORT=tcp://100.66.60.115:5140
HOSTNAME=logstash-0
LOGSTASH_SERVICE_PORT=5044
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT=tcp://100.64.0.1:443
QUEUE_DRAIN=true
TERM=xterm
LOGSTASH_SERVICE_PORT_BEATS=5044
LOGSTASH_TCP_PORT_5140_TCP_ADDR=100.68.142.29
ELASTIC_CONTAINER=true
LOGSTASH_TCP_PORT_5140_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT=443
HTTP_PORT=9600
LOGSTASH_TCP_PORT_5145_TCP_ADDR=100.68.142.29
OLDPWD=/usr/share/logstash
LOGSTASH_TCP_SERVICE_PORT=5145
KUBERNETES_SERVICE_HOST=100.64.0.1
HTTP_HOST=0.0.0.0
LOGSTASH_BEATS_PORT_5140_TCP_ADDR=100.66.60.115
LC_ALL=en_US.UTF-8
LOGSTASH_PORT_5140_TCP_ADDR=100.69.179.45
PATH_DATA=/usr/share/logstash/data
QUEUE_CHECKPOINT_WRITES=1
LOGSTASH_PORT_5044_TCP_PORT=5044
LOGSTASH_PORT_5145_TCP_PORT=5145
LOGSTASH_BEATS_SERVICE_HOST=100.66.60.115
QUEUE_PAGE_CAPACITY=10mb
LOGSTASH_BEATS_PORT_5140_TCP_PORT=5140
LOGSTASH_BEATS_PORT_5044_TCP_PROTO=tcp
ELASTICSEARCH_HOST=
LOGSTASH_PORT_5145_TCP=tcp://100.69.179.45:5145
LOGSTASH_PORT_5044_TCP=tcp://100.69.179.45:5044
LOGSTASH_TCP_SERVICE_PORT_SYSLOG_PRIV=5145
PATH=/usr/share/logstash/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LOGSTASH_TCP_PORT_5140_TCP=tcp://100.68.142.29:5140
LOGSTASH_TCP_PORT_5145_TCP_PORT=5145
LOGSTASH_BEATS_SERVICE_PORT_HEALTHCHECK=5140
PWD=/usr/share/logstash/bin
LOGSTASH_SERVICE_HOST=100.69.179.45
LOGSTASH_BEATS_SERVICE_PORT=5140
LS_JAVA_OPTS=-Xms9000M -Xmx9000M -XX:+PrintFlagsFinal -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.rmi.port=1099 -Djava.rmi.server.hostname=127.0.0.1
LANG=en_US.UTF-8
LOGSTASH_PORT_5140_TCP=tcp://100.69.179.45:5140
LOGSTASH_TCP_SERVICE_PORT_SYSLOG_PUB=5140
QUEUE_TYPE=persisted
LOGSTASH_PORT=tcp://100.69.179.45:5044
LOGSTASH_BEATS_SERVICE_PORT_BEATS=5044
ENVIRONMENT=prod
FIELD_REFERENCE_PARSER=STRICT
SHLVL=1
HOME=/usr/share/logstash
LOGSTASH_BEATS_PORT_5044_TCP=tcp://100.66.60.115:5044
ELASTICSEARCH_PORT=
CONFIG_RELOAD_AUTOMATIC=true
KUBERNETES_PORT_443_TCP_PROTO=tcp
LOGSTASH_PORT_5140_TCP_PORT=5140
KUBERNETES_SERVICE_PORT_HTTPS=443
LOGSTASH_PORT_5145_TCP_PROTO=tcp
LOGSTASH_BEATS_PORT_5140_TCP=tcp://100.66.60.115:5140
LOGSTASH_PORT_5044_TCP_PROTO=tcp
LOGSTASH_BEATS_PORT_5044_TCP_PORT=5044
LOGSTASH_TCP_PORT=tcp://100.68.142.29:5145
LOGSTASH_PORT_5145_TCP_ADDR=100.69.179.45
CIA_PORT_80_TCP_PROTO=tcp
LOGSTASH_PORT_5044_TCP_ADDR=100.69.179.45
LOGSTASH_TCP_PORT_5145_TCP=tcp://100.68.142.29:5145
LOGSTASH_SERVICE_PORT_SYSLOG_PRIV=5145
LOGSTASH_TCP_PORT_5140_TCP_PORT=5140
LOGSTASH_TCP_PORT_5145_TCP_PROTO=tcp
LOGSTASH_BEATS_PORT_5044_TCP_ADDR=100.66.60.115
KUBERNETES_PORT_443_TCP_ADDR=100.64.0.1
LOGSTASH_TCP_SERVICE_HOST=100.68.142.29
CONFIG_RELOAD_INTERVAL=5m
PIPELINE_BATCH_DELAY=5
KUBERNETES_PORT_443_TCP=tcp://100.64.0.1:443
LOGSTASH_SERVICE_PORT_SYSLOG_PUB=5140

We've tweaked most of these settings, especially around the queue with no success. If it matters the instance type we're running on is c5.4xlarge, 512Gib gp2 EBS volume, Kubernetes 1.10, deployed using Helm.

Basic disk i/o testing with dd didn't show any slow down between the node and the logstash container and showed both being faster than our legacy logstash ec2s as expected. Changing to an in memory queue showed a significant performance boost ruling out network issues.

I'm just about out of ideas, I'd appreciate it if anyone has anything else to try. I realize that turning down/off the checkpoint writes would likely be a large improvement but we need durability - and it works fine on our old cluster. Thanks.

After some further testing using the the dd oflag=sync option to more closely simulate the effect of having checkpoint writes set to 1 we actually saw AWS standard (magnetic) EBS volumes out performing gp2 (ssd) EBS volumes by nearly 100%. This appears to be an AWS issue and not either logstash or kubernetes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.