Queue.drain: true not working for logstash as K8s setup

Karthik_N · April 3, 2023, 4:46pm

Hi Team,
We have issues in draining the logstash queue, this config is queue.drain: true not working. Our Current logstash setup in K8s and persistence queue setup in EBS volume, after killing our one of the logstash pod with SIGTERM received, queue is not draining up, please find the below logstash logs for more details

Logstash logs after deleting one of logstash pod:
WARN ] 2023-04-03 11:46:44.324 [SIGTERM handler] runner - SIGTERM received. Shutting down.
[INFO ] 2023-04-03 11:46:51.043 [[pMain]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"pMain"}
[INFO ] 2023-04-03 11:46:51.705 [Converge PipelineAction::StopAndDelete] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:pMain}
[INFO ] 2023-04-03 11:46:54.031 [[pecsALB]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"pecsALB"}
[INFO ] 2023-04-03 11:46:54.131 [Converge PipelineAction::StopAndDelete] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:pecsALB}
[INFO ] 2023-04-03 11:46:54.268 [[pecsMGW]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"pecsMGW"}
[INFO ] 2023-04-03 11:46:55.121 [Converge PipelineAction::StopAndDelete] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:pecsMGW}
[INFO ] 2023-04-03 11:46:55.688 [[pecsMainK8S]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"pecsMainK8S"}
[INFO ] 2023-04-03 11:46:56.137 [Converge PipelineAction::StopAndDelete] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:pecsMainK8S}
[INFO ] 2023-04-03 11:46:58.505 [[pecsConfiguration]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"pecsConfiguration"}
[INFO ] 2023-04-03 11:46:58.507 [[pecsUpdates]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"pecsUpdates"}
[INFO ] 2023-04-03 11:46:58.533 [Converge PipelineAction::StopAndDelete] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:pecsUpdates}
[INFO ] 2023-04-03 11:46:58.685 [[pCloudedcbridge]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"pCloudedcbridge"}
[INFO ] 2023-04-03 11:46:58.704 [Converge PipelineAction::StopAndDelete] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:pCloudedcbridge}
[INFO ] 2023-04-03 11:46:58.741 [Converge PipelineAction::StopAndDelete] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:pecsConfiguration}
[WARN ] 2023-04-03 11:47:00.544 [Converge PipelineAction::StopAndDelete] ShutdownWatcherExt - {"inflight_count"=>0, "stalling_threads_info"=>{}}
[ERROR] 2023-04-03 11:47:00.547 [Converge PipelineAction::StopAndDelete] ShutdownWatcherExt - The shutdown process appears to be stalled due to busy or blocked plugins. Check the logs for more information.
[INFO ] 2023-04-03 11:47:01.044 [[pSummary]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"pSummary"}
[INFO ] 2023-04-03 11:47:01.551 [Converge PipelineAction::StopAndDelete] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:pSummary}
[INFO ] 2023-04-03 11:47:01.646 [LogStash::Runner] runner - Logstash shut down.

logstash_PQ_not _draning

Please suggest if any config is missing from logstash.ymal in order to drain the logstash queue before the logstash shut down.

Thanks,
Karthik

Karthik_N · April 5, 2023, 4:11pm

DearTeam,

Somebody please help on my post about logstash queue.drain: true option not working

Thanks,
Karthik

leandrojmp · April 5, 2023, 5:06pm

You need to provide more information.

Please share your entire logstash.yml and your pipelines.yml files.

Karthik_N · April 5, 2023, 5:16pm

Thank you for the quick response, please find below entire logstash.yml and pipelines.yml files which we are using currently

logstash.yml

# Settings file in YAML
#
# Settings can be specified either in hierarchical form, e.g.:
#
#   pipeline:
#     batch:
#       size: 125
#       delay: 5
#
# Or as flat keys:
#
#   pipeline.batch.size: 125
#   pipeline.batch.delay: 5
#
# ------------  Node identity ------------
#
# Use a descriptive name for the node:
#
# node.name: test
#
# If omitted the node name will default to the machine's host name
#
# ------------ Data path ------------------
#
# Which directory should be used by logstash and its plugins
# for any persistent needs. Defaults to LOGSTASH_HOME/data
#
######path.data: /usr/share/logstash/data
#
# ------------ Pipeline Settings --------------
#
# The ID of the pipeline.
#
# pipeline.id: main
#
# Set the number of workers that will, in parallel, execute the filters+outputs
# stage of the pipeline.
#
# This defaults to the number of the host's CPU cores.
#
pipeline.workers: 15
#
# How many events to retrieve from inputs before sending to filters+workers
#
pipeline.batch.size: 1500
#
# How long to wait in milliseconds while polling for the next event
# before dispatching an undersized batch to filters+outputs
#
pipeline.batch.delay: 600
#
# Force Logstash to exit during shutdown even if there are still inflight
# events in memory. By default, logstash will refuse to quit until all
# received events have been pushed to the outputs.
#
# WARNING: Enabling this can lead to data loss during shutdown
#
# pipeline.unsafe_shutdown: false
#
# Set the pipeline event ordering. Options are "auto" (the default), "true" or "false".
# "auto" automatically enables ordering if the 'pipeline.workers' setting
# is also set to '1', and disables otherwise.
# "true" enforces ordering on the pipeline and prevent logstash from starting
# if there are multiple workers.
# "false" disables any extra processing necessary for preserving ordering.
#
# pipeline.ordered: auto
#
# Sets the pipeline's default value for `ecs_compatibility`, a setting that is
# available to plugins that implement an ECS Compatibility mode for use with
# the Elastic Common Schema.
# Possible values are:
# - disabled
# - v1
# - v8 (default)
# Pipelines defined before Logstash 8 operated without ECS in mind. To ensure a
# migrated pipeline continues to operate as it did before your upgrade, opt-OUT
# of ECS for the individual pipeline in its `pipelines.yml` definition. Setting
# it here will set the default for _all_ pipelines, including new ones.
#
# pipeline.ecs_compatibility: v8
pipeline.ecs_compatibility: disabled
#
# ------------ Pipeline Configuration Settings --------------
#
# Where to fetch the pipeline configuration for the main pipeline
#
# path.config:
#
# Pipeline configuration string for the main pipeline
#
# config.string:
#
# At startup, test if the configuration is valid and exit (dry run)
#
# config.test_and_exit: false
#
# Periodically check if the configuration has changed and reload the pipeline
# This can also be triggered manually through the SIGHUP signal
#
config.reload.automatic: true
#
# How often to check if the pipeline configuration has changed (in seconds)
# Note that the unit value (s) is required. Values without a qualifier (e.g. 60)
# are treated as nanoseconds.
# Setting the interval this way is not recommended and might change in later versions.
#
# config.reload.interval: 3s
#
# Show fully compiled configuration as debug log message
# NOTE: --log.level must be 'debug'
#
# config.debug: false
#
# When enabled, process escaped characters such as \n and \" in strings in the
# pipeline configuration files.
#
# config.support_escapes: false
#
# ------------ API Settings -------------
# Define settings related to the HTTP API here.
#
# The HTTP API is enabled by default. It can be disabled, but features that rely
# on it will not work as intended.
#
# api.enabled: true
#
# By default, the HTTP API is not secured and is therefore bound to only the
# host's loopback interface, ensuring that it is not accessible to the rest of
# the network.
# When secured with SSL and Basic Auth, the API is bound to _all_ interfaces
# unless configured otherwise.
#
# api.http.host: 127.0.0.1
#
# The HTTP API web server will listen on an available port from the given range.
# Values can be specified as a single port (e.g., `9600`), or an inclusive range
# of ports (e.g., `9600-9700`).
#
# api.http.port: 9600-9700
#
# The HTTP API includes a customizable "environment" value in its response,
# which can be configured here.
#
# api.environment: "production"
#
# The HTTP API can be secured with SSL (TLS). To do so, you will need to provide
# the path to a password-protected keystore in p12 or jks format, along with credentials.
#
# api.ssl.enabled: false
# api.ssl.keystore.path: /path/to/keystore.jks
# api.ssl.keystore.password: "y0uRp4$$w0rD"
#
# The HTTP API can be configured to require authentication. Acceptable values are
#  - `none`:  no auth is required (default)
#  - `basic`: clients must authenticate with HTTP Basic auth, as configured
#             with `api.auth.basic.*` options below
# api.auth.type: none
#
# When configured with `api.auth.type` `basic`, you must provide the credentials
# that requests will be validated against. Usage of Environment or Keystore
# variable replacements is encouraged (such as the value `"${HTTP_PASS}"`, which
# resolves to the value stored in the keystore's `HTTP_PASS` variable if present
# or the same variable from the environment)
#
# api.auth.basic.username: "logstash-user"
# api.auth.basic.password: "s3cUreP4$$w0rD"
#
# When setting `api.auth.basic.password`, the password should meet
# the default password policy requirements.
# The default password policy requires non-empty minimum 8 char string that
# includes a digit, upper case letter and lower case letter.
# Policy mode sets Logstash to WARN or ERROR when HTTP authentication password doesn't
# meet the password policy requirements.
# The default is WARN. Setting to ERROR enforces stronger passwords (recommended).
#
# api.auth.basic.password_policy.mode: WARN
#
# ------------ Module Settings ---------------
# Define modules here.  Modules definitions must be defined as an array.
# The simple way to see this is to prepend each `name` with a `-`, and keep
# all associated variables under the `name` they are associated with, and
# above the next, like this:
#
# modules:
#   - name: MODULE_NAME
#     var.PLUGINTYPE1.PLUGINNAME1.KEY1: VALUE
#     var.PLUGINTYPE1.PLUGINNAME1.KEY2: VALUE
#     var.PLUGINTYPE2.PLUGINNAME1.KEY1: VALUE
#     var.PLUGINTYPE3.PLUGINNAME3.KEY1: VALUE
#
# Module variable names must be in the format of
#
# var.PLUGIN_TYPE.PLUGIN_NAME.KEY
#
# modules:
#
# ------------ Cloud Settings ---------------
# Define Elastic Cloud settings here.
# Format of cloud.id is a base64 value e.g. dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyRub3RhcmVhbCRpZGVudGlmaWVy
# and it may have an label prefix e.g. staging:dXMtZ...
# This will overwrite 'var.elasticsearch.hosts' and 'var.kibana.host'
# cloud.id: <identifier>
#
# Format of cloud.auth is: <user>:<pass>
# This is optional
# If supplied this will overwrite 'var.elasticsearch.username' and 'var.elasticsearch.password'
# If supplied this will overwrite 'var.kibana.username' and 'var.kibana.password'
# cloud.auth: elastic:<password>
#
# ------------ Queuing Settings --------------
#
# Internal queuing model, "memory" for legacy in-memory based queuing and
# "persisted" for disk-based acked queueing. Defaults is memory
#
# queue.type: memory

queue.type: persisted
queue.max_bytes: 4gb
queue.drain: true
queue.checkpoint.retry: true
# Default is path.data/queue.  ######path.data: /usr/share/logstash/data
### in k8s this persistence queue path should be repalaced with EBS volume
######path.queue: /data/logstash/queue

#
# If `queue.type: persisted`, the directory path where the pipeline data files will be stored.
# Each pipeline will group its PQ files in a subdirectory matching its `pipeline.id`.
# Default is path.data/queue.  ######path.data: /usr/share/logstash/data
#
# path.queue:
#
# If using queue.type: persisted, the page data files size. The queue data consists of
# append-only data files separated into pages. Default is 64mb
#
# queue.page_capacity: 64mb
#
# If using queue.type: persisted, the maximum number of unread events in the queue.
# Default is 0 (unlimited)
#
# queue.max_events: 0
#
# If using queue.type: persisted, the total capacity of the queue in number of bytes.
# If you would like more unacked events to be buffered in Logstash, you can increase the
# capacity using this setting. Please make sure your disk drive has capacity greater than
# the size specified here. If both max_bytes and max_events are specified, Logstash will pick
# whichever criteria is reached first
# Default is 1024mb or 1gb
#
# queue.max_bytes: 1024mb
#
# If using queue.type: persisted, the maximum number of acked events before forcing a checkpoint
# Default is 1024, 0 for unlimited
#
# queue.checkpoint.acks: 1024
#
# If using queue.type: persisted, the maximum number of written events before forcing a checkpoint
# Default is 1024, 0 for unlimited
#
# queue.checkpoint.writes: 1024
#
# If using queue.type: persisted, the interval in milliseconds when a checkpoint is forced on the head page
# Default is 1000, 0 for no periodic checkpoint.
#
# queue.checkpoint.interval: 1000
#
# ------------ Dead-Letter Queue Settings --------------
# Flag to turn on dead-letter queue.
#

dead_letter_queue.enable: true

######dead_letter_queue.max_bytes: 4gb

# If using dead_letter_queue.enable: true, the maximum size of each dead letter queue. Entries
# will be dropped if they would increase the size of the dead letter queue beyond this setting.
# Default is 1024mb
# dead_letter_queue.max_bytes: 1024mb

# If using dead_letter_queue.enable: true, the interval in milliseconds where if no further events eligible for the DLQ
# have been created, a dead letter queue file will be written. A low value here will mean that more, smaller, queue files
# may be written, while a larger value will introduce more latency between items being "written" to the dead letter queue, and
# being available to be read by the dead_letter_queue input when items are written infrequently.
# Default is 5000.
#
# dead_letter_queue.flush_interval: 5000

# If using dead_letter_queue.enable: true, controls which entries should be dropped to avoid exceeding the size limit.
# Set the value to `drop_newer` (default) to stop accepting new events that would push the DLQ size over the limit.
# Set the value to `drop_older` to remove queue pages containing the oldest events to make space for new ones.
#
# dead_letter_queue.storage_policy: drop_newer

# If using dead_letter_queue.enable: true, the interval that events have to be considered valid. After the interval has
# expired the events could be automatically deleted from the DLQ.
# The interval could be expressed in days, hours, minutes or seconds, using as postfix notation like 5d,
# to represent a five days interval.
# The available units are respectively d, h, m, s for day, hours, minutes and seconds.
# If not specified then the DLQ doesn't use any age policy for cleaning events.
#
# dead_letter_queue.retain.age: 1d

# If using dead_letter_queue.enable: true, defines the action to take when the dead_letter_queue.max_bytes is reached,
# could be "drop_newer" or "drop_older".
# With drop_newer, messages that were inserted most recently are dropped, logging an error line.
# With drop_older setting, the oldest messages are dropped as new ones are inserted.
# Default value is "drop_newer".
# dead_letter_queue.storage_policy: drop_newer

# If using dead_letter_queue.enable: true, the directory path where the data files will be stored.
# Default is path.data/dead_letter_queue
#
# path.dead_letter_queue:
#
# ------------ Debugging Settings --------------
#
# Options for log.level:
#   * fatal
#   * error
#   * warn
#   * info (default)
#   * debug
#   * trace
#
# log.level: info
path.logs: /usr/share/logstash/logs
#
# ------------ Other Settings --------------
#
# Allow or block running Logstash as superuser (default: true)
# allow_superuser: false
#
# Where to find custom plugins
# path.plugins: []
#
# Flag to output log lines of each pipeline in its separate log file. Each log filename contains the pipeline.name
# Default is false
# pipeline.separate_logs: false
#
# ------------ X-Pack Settings (not applicable for OSS build)--------------
#
# X-Pack Monitoring
# https://www.elastic.co/guide/en/logstash/current/monitoring-logstash.html
#xpack.monitoring.enabled: false
#xpack.monitoring.elasticsearch.username: logstash_system
#xpack.monitoring.elasticsearch.password: password
#xpack.monitoring.elasticsearch.proxy: ["http://proxy:port"]
#xpack.monitoring.elasticsearch.hosts: ["https://es1:9200", "https://es2:9200"]
# an alternative to hosts + username/password settings is to use cloud_id/cloud_auth
#xpack.monitoring.elasticsearch.cloud_id: monitoring_cluster_id:xxxxxxxxxx
#xpack.monitoring.elasticsearch.cloud_auth: logstash_system:password
# another authentication alternative is to use an Elasticsearch API key
#xpack.monitoring.elasticsearch.api_key: "id:api_key"
#xpack.monitoring.elasticsearch.ssl.certificate_authority: "/path/to/ca.crt"
#xpack.monitoring.elasticsearch.ssl.ca_trusted_fingerprint: xxxxxxxxxx
#xpack.monitoring.elasticsearch.ssl.truststore.path: path/to/file
#xpack.monitoring.elasticsearch.ssl.truststore.password: password
#xpack.monitoring.elasticsearch.ssl.keystore.path: /path/to/file
#xpack.monitoring.elasticsearch.ssl.keystore.password: password
#xpack.monitoring.elasticsearch.ssl.verification_mode: certificate
#xpack.monitoring.elasticsearch.sniffing: false
#xpack.monitoring.collection.interval: 10s
#xpack.monitoring.collection.pipeline.details.enabled: true
#
# X-Pack Management
# https://www.elastic.co/guide/en/logstash/current/logstash-centralized-pipeline-management.html
#xpack.management.enabled: false
#xpack.management.pipeline.id: ["main", "apache_logs"]
#xpack.management.elasticsearch.username: logstash_admin_user
#xpack.management.elasticsearch.password: password
#xpack.management.elasticsearch.proxy: ["http://proxy:port"]
#xpack.management.elasticsearch.hosts: ["https://es1:9200", "https://es2:9200"]
# an alternative to hosts + username/password settings is to use cloud_id/cloud_auth
#xpack.management.elasticsearch.cloud_id: management_cluster_id:xxxxxxxxxx
#xpack.management.elasticsearch.cloud_auth: logstash_admin_user:password
# another authentication alternative is to use an Elasticsearch API key
#xpack.management.elasticsearch.api_key: "id:api_key"
#xpack.management.elasticsearch.ssl.ca_trusted_fingerprint: xxxxxxxxxx
#xpack.management.elasticsearch.ssl.certificate_authority: "/path/to/ca.crt"
#xpack.management.elasticsearch.ssl.truststore.path: /path/to/file
#xpack.management.elasticsearch.ssl.truststore.password: password
#xpack.management.elasticsearch.ssl.keystore.path: /path/to/file
#xpack.management.elasticsearch.ssl.keystore.password: password
#xpack.management.elasticsearch.ssl.verification_mode: certificate
#xpack.management.elasticsearch.sniffing: false
#xpack.management.logstash.poll_interval: 5s

# X-Pack GeoIP plugin
# https://www.elastic.co/guide/en/logstash/current/plugins-filters-geoip.html#plugins-filters-geoip-manage_update
#xpack.geoip.download.endpoint: "https://geoip.elastic.co/v1/database"

pipelines.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html

- pipeline.id : pMain
  path.config : "/usr/share/logstash/config/vdcapp-pMainFluentD.conf"
  pipeline.workers : 4

##for Main-ASDI-k8s 
- pipeline.id : pecsMainK8S
  path.config : "/usr/share/logstash/config/vdcapp-ecs-pMain-ASDI-k8s.conf"
  pipeline.workers : 4

##for pSummary 
- pipeline.id : pSummary
  path.config : "/usr/share/logstash/config/vdcapp-ecs-pASDI-Summary.conf"
  pipeline.workers : 4
  
##for pecsUpdates
- pipeline.id : pecsUpdates
  path.config : "/usr/share/logstash/config/vdcapp-ecs-pASDIRecommendUpdates.conf"
  pipeline.workers : 2

##for pecsConfiguration
- pipeline.id : pecsConfiguration
  path.config : "/usr/share/logstash/config/vdcapp-ecs-pASDIConfiguration.conf"
  pipeline.workers : 2
  
##for ecs ALB-Summary 
- pipeline.id : pecsALB
  path.config : "/usr/share/logstash/config/vdcapp-ecs-pALB.conf"
  pipeline.workers : 2

##for ecs MGW-Summary 
- pipeline.id : pecsMGW
  path.config : "/usr/share/logstash/config/vdcapp-ecs-pMGW.conf"
  pipeline.workers : 2
  
- pipeline.id : pCloudedcbridge
  path.config : "/usr/share/logstash/config/vdcapp-pCloudedcbridge.conf"
  pipeline.workers : 2

prashant1 · April 7, 2023, 11:41am

Hi @leandrojmp
We also faced the similar issue. queue.drain is not working as expected.

Could you please help us in resolving the issue ?

leandrojmp · April 7, 2023, 2:02pm

I see nothing wrong in the configuration and your logs also do not show nothing wrong.

What do you have inside the queue folder?

leandrojmp · April 7, 2023, 2:03pm

I suggest that you open a different topic and provide information as your issue as it could be completely different.

Karthik_N · April 7, 2023, 5:13pm

Hi @leandrojmp
Please find the structure of persistence queue

logstash@logstash-fluentd-asdi-0:~/data/queue$ ls -lrt
total 32
drwxrwsr-x 2 logstash logstash 4096 Apr  5 17:03 pecsARUp
drwxrwsr-x 2 logstash logstash 4096 Apr  5 17:03 pecsAConf
drwxrwsr-x 2 logstash logstash 4096 Apr  6 14:26 pCb
drwxrwsr-x 2 logstash logstash 4096 Apr  7 17:05 pecsALB
drwxrwsr-x 2 logstash logstash 4096 Apr  7 17:09 pecsASu
drwxrwsr-x 2 logstash logstash 4096 Apr  7 17:09 pecsMGW
drwxrwsr-x 2 logstash logstash 4096 Apr  7 17:09 pMainFluentD
drwxrwsr-x 2 logstash logstash 4096 Apr  7 17:09 pecsAIMainK8S
logstash@logstash-fluentd-asdi-0:~/data/queue$ du -sh *
33M     pCb
17M     pecsALB
16K     pecsAConf
56M     pecsAIMainK8S
16K     pecsARUp
1.3M    pecsASu
13M     pecsMGW
50M     pMainFluentD
logstash@logstash-fluentd-asdi-0:~/data/queue$ cd pecsASu
logstash@logstash-fluentd-asdi-0:~/data/queue/pecsASu$ ls -lrt
total 1348
-rw-r--r-- 1 logstash logstash 67108864 Apr  7 17:09 page.219
-rw-r--r-- 1 logstash logstash       34 Apr  7 17:09 checkpoint.head

system · April 7, 2023, 5:13pm

logstash 67108864 is EOL and no longer supported. Please upgrade ASAP.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

leandrojmp · April 7, 2023, 5:47pm

Something is not matching up.

These are the pipelines you shared from your pipelines.yml

pMain
pecsMainK8S
pSummary
pecsUpdates
pecsConfiguration
pecsALB
pecsMGW
pCloudedcbridge

And these are the queues you have:

pCb
pecsALB
pecsAConf
pecsAIMainK8S
pecsARUp
pecsASu
pecsMGW
pMainFluentD

You should have one directory named after each pipeline you have, but from the pipelines you shared only two have queues in that path:

pecsALB
pecsMGW

The other pipelines does not show any queues in this path and you also have queues for pipelines that are not in your pipelines.yml.

How are you mounting this path? You didn't specify any path.queue or path.data in your logstash.yml.

Are you sure that your queues are using the persisted volume and not the container volume? What do you have on the logs when you start logstash? It should output the path it is using.

Also, are you sharing this path with multiple instances?

Karthik_N · April 10, 2023, 8:29am

You should have one directory named after each pipeline you have, but from the pipelines you shared only two have queues in that path:

We have queues with same name mentioned in pipelines.yml, but project secrecy I have edited it purposefully and matched differently, please find below queues we have, also I have pasted logstash logs in end, even you can see some mismatch there, please ignore those.

pMain
pecsMainK8S
pSummary
pecsUpdates
pecsConfiguration
pecsALB
pecsMGW
pCloudedcbridge

How are you mounting this path? You didn't specify any path.queue or path.data in your logstash.yml .

We are using the default path for path.data and path.queue

path.data: /usr/share/logstash/data
path.queue: path.data/queue   i,e<-- (/usr/share/logstash/data/queue)

Are you sure that your queues are using the persisted volume and not the container volume?

We are using EBS volume as persisted volume, each logstash pod will have one EBS pv volume attached

What do you have on the logs when you start logstash? It should output the path it is using.

Using bundled JDK: /usr/share/logstash/jdk
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2023-04-10 07:42:16.309 [main] runner - Starting Logstash {"logstash.version"=>"8.4.0", "jruby.version"=>"jruby 9.3.6.0 (2.6.8) 2022-06-27 7a2cbcd376 OpenJDK 64-Bit Server VM 17.0.4+8 on 17.0.4+8 +indy +jit [x86_64-linux]"}
[INFO ] 2023-04-10 07:42:16.315 [main] runner - JVM bootstrap flags: [-Xms4g, -Xmx8g, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djruby.compile.invokedynamic=true,
-Djruby.jit.threshold=0, -XX:+HeapDumpOnOutOfMemoryError, -Djava.security.egd=file:/dev/urandom, -Dlog4j2.isThreadContextMapInheritable=true, -Dls.cgroup.cpuacct.path.over
ride=/, -Dls.cgroup.cpu.path.override=/, -Xmx4g, -Xms4g, -Djruby.regexp.interruptible=true, -Djdk.io.File.enableADS=true, --add-exports=jdk.compiler/com.sun.tools.javac.ap
i=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED, --add-exports=jdk.compil
er/com.sun.tools.javac.tree=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED, --add-opens=java.base/java.security=ALL-UNNAMED, --add-opens=java.base/java.io=ALL-UNNAMED, --add-opens=java.base/java.nio.channels=ALL-UNNAMED, --add-opens=java.base/sun.nio.ch=ALL-UNNAMED, --add-opens=java.management/sun.management=AL
L-UNNAMED]
[INFO ] 2023-04-10 07:42:16.346 [main] settings - Creating directory {:setting=>"path.queue", :path=>"/usr/share/logstash/data/queue"}
[INFO ] 2023-04-10 07:42:16.356 [main] settings - Creating directory {:setting=>"path.dead_letter_queue", :path=>"/usr/share/logstash/data/dead_letter_queue"}
[INFO ] 2023-04-10 07:42:16.677 [LogStash::Runner] agent - No persistent UUID file found. Generating new UUID {:uuid=>"f69a1565-81d9-4717-a258-5cd90e0e2fde", :path=>"/usr/
share/logstash/data/uuid"}
[INFO ] 2023-04-10 07:42:17.618 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}
[WARN ] 2023-04-10 07:42:17.724 [Agent thread] persistedqueueconfigvalidator - The persistent queue on path "/usr/share/logstash/data/queue/pMainFluentD" won't fit in file
 system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes. The persistent queue on path "/usr/share/logstash/data/queue/pecsAGENTXK8S" won't fit in
file system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes. The persistent queue on path "/usr/share/logstash/data/queue/pecsAGENTX" won't fit in
 file system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes. The persistent queue on path "/usr/share/logstash/data/queue/pecsAGENTXInflightStep"
 won't fit in file system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes. The persistent queue on path "/usr/share/logstash/data/queue/pecsAGENTX
kafkamessenger" won't fit in file system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes. The persistent queue on path "/usr/share/logstash/data/q
ueue/pecsAGENTXErrors" won't fit in file system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes. The persistent queue on path "/usr/share/logstash
/data/queue/pecsALB" won't fit in file system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes. The persistent queue on path "/usr/share/logstash/d
ata/queue/pecsASRCK8S" won't fit in file system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes. The persistent queue on path "/usr/share/logstash/data/queue/pecsASPCVRXSummary" won't fit in file system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes. The persistent queue on path "/usr/share
/logstash/data/queue/pecsASRCErrors" won't fit in file system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes. The persistent queue on path "/usr/
share/logstash/data/queue/pecsAGENTXHealthManager" won't fit in file system "/dev/nvme1n1" when full. Please free or allocate 47244640256 more bytes.
[INFO ] 2023-04-10 07:42:22.159 [Converge PipelineAction::Create<pecsAGENTXHealthManager>] Reflections - Reflections took 559 ms to scan 1 urls, producing 125 keys and 434
 values[INFO ] 2023-04-10 07:42:25.840 [Converge PipelineAction::Create<pecsASRCErrors>] QueueUpgrade - No PQ version file found, upgrading to PQ v2.
[INFO ] 2023-04-10 07:42:25.882 [Converge PipelineAction::Create<pecsASPCVRXSummary>] QueueUpgrade - No PQ version file found, upgrading to PQ v2.
[INFO ] 2023-04-10 07:42:25.886 [Converge PipelineAction::Create<pecsAGENTXHealthManager>] QueueUpgrade - No PQ version file found, upgrading to PQ v2.
[INFO ] 2023-04-10 07:42:25.919 [Converge PipelineAction::Create<pecsALB>] QueueUpgrade - No PQ version file found, upgrading to PQ v2.
[INFO ] 2023-04-10 07:42:25.917 [Converge PipelineAction::Create<pecsAGENTXkafkamessenger>] QueueUpgrade - No PQ version file found, upgrading to PQ v2.
[INFO ] 2023-04-10 07:42:25.978 [Converge PipelineAction::Create<pecsASRCErrors>] javapipeline - Pipeline `pecsASRCErrors` is configured with `pipeline.ecs_compatibility:disabled` setting. All plugins in this pipeline will default to `ecs_compatibility => disabled` unless explicitly configured otherwise.
[INFO ] 2023-04-10 07:42:25.978 [Converge PipelineAction::Create<pecsALB>] javapipeline - Pipeline `pecsALB` is configured with `pipeline.ecs_compatibility: disabled` setting. All plugins in this pipeline will default to `ecs_compatibility => disabled` unless explicitly configured otherwise.

[WARN ] 2023-04-10 07:42:26.202 [[pMainFluentD]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been created for key: send_to. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[WARN ] 2023-04-10 07:42:26.211 [[pMainFluentD]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been created for key: send_to. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[INFO ] 2023-04-10 07:42:26.224 [Converge PipelineAction::Create<pecsASRCK8S>] QueueUpgrade - No PQ version file found, upgrading to PQ v2.
[WARN ] 2023-04-10 07:42:26.224 [[pMainFluentD]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been created for key: send_to. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[WARN ] 2023-04-10 07:42:26.227 [[pMainFluentD]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been created for key: send_to. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[WARN ] 2023-04-10 07:42:26.228 [[pMainFluentD]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been created for key: send_to. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[WARN ] 2023-04-10 07:42:26.229 [[pMainFluentD]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been created for key: send_to. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[INFO ] 2023-04-10 07:42:26.234 [Converge PipelineAction::Create<pecsASRCK8S>] javapipeline - Pipeline `pecsASRCK8S` is configured with `pipeline.ecs_compatibility: disabled` setting. All plugins in this pipeline will default to `ecs_compatibility => disabled` unless explicitly configured otherwise.
[INFO ] 2023-04-10 07:42:28.109 [[pecsASRCK8S]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"pecsASRCK8S", "pipeline.workers"=>2, "pipeline.batch.size"=>1500, "pipeline.batch.delay"=>600, "pipeline.max_inflight"=>3000, "pipeline.sources"=>["/usr/share/logstash/config/vdcapp-ecs-pMain-ASRC-k8s.conf"], :thread=>"#<Thread:0x2eee20a7 run>"}
[INFO ] 2023-04-10 07:42:28.117 [[pecsAGENTXK8S]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"pecsAGENTXK8S", "pipeline.workers"=>4, "pipeline.batch.size"=>1500, "pipeline.batch.delay"=>600, "pipeline.max_inflight"=>6000, "pipeline.sources"=>["/usr/share/logstash/config/vdcapp-ecs-pMain-AGENTX-k8s.conf"], :thread=>"#<Thread:0x7f62ecf0 run>"}
[INFO ] 2023-04-10 07:42:28.122 [[pecsASPCVRXSummary]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"pecsASPCVRXSummary", "pipeline.workers"=>2, "pipeline.batch.size"=>1500, "pipeline.batch.delay"=>600, "pipeline.max_inflight"=>3000, "pipeline.sources"=>["/usr/share/logstash/config/vdcapp-ecs-pASPCVRXSummary.conf"], :thread=>"#<Thread:0x7381d2f8 run>"}
[INFO ] 2023-04-10 07:42:28.230 [[pecsAGENTXkafkamessenger]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"pecsAGENTXkafkamessenger", "pipeline.workers"=>4, "pipeline.batch.size"=>1500, "pipeline.batch.delay"=>600, "pipeline.max_inflight"=>6000, "pipeline.sources"=>["/usr/share/logstash/config/vdcapp-ecs-pAgentxKafkaMessengerService.conf"], :thread=>"#<Thread:0x7b039b1f@/usr/share/logstash/logstash-core/lib/logstash/pipelines_registry.rb:159 run>"}
[INFO ] 2023-04-10 07:42:28.247 [[pecsAGENTX]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"pecsAGENTX", "pipeline.workers"=>4, "pipeline.batch.size"=>1500, "pipeline.batch.delay"=>600, "pipeline.max_inflight"=>6000, "pipeline.sources"=>["/usr/share/logstash/config/vdcapp-ecs-pAGENTX.conf"], :thread=>"#<Thread:0x3f7eaafa run>"}
[INFO ] 2023-04-10 07:42:28.369 [[pecsAGENTXHealthManager]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"pecsAGENTXHealthManager", "pipeline.workers"=>2, "pipeline.batch.size"=>1500, "pipeline.batch.delay"=>600, "pipeline.max_inflight"=>3000, "pipeline.sources"=>["/usr/share/logstash/config/vdcapp-ecs-pAgentxHealthManager.conf"], :thread=>"#<Thread:0x7e8ef363 run>"}
[WARN ] 2023-04-10 07:42:28.380 [[pecsALB]-pipeline-manager] javapipeline - 'pipeline.ordered' is enabled and is likely less efficient, consider disabling if preserving event order is not necessary
[INFO ] 2023-04-10 07:42:28.405 [[pecsALB]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"pecsALB", "pipeline.workers"=>1, "pipeline.batch.size"=>1500, "pipeline.batch.delay"=>600, "pipeline.max_inflight"=>1500, "pipeline.sources"=>["/usr/share/logstash/config/vdcapp-ecs-pALB.conf"], :thread=>"#<Thread:0x342f5592@/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:130 run>"}
[INFO ] 2023-04-10 07:42:28.552 [[pecsAGENTXInflightStep]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"pecsAGENTXInflightStep", "pipeline.workers"=>4, "pipeline.batch.size"=>1500, "pipeline.batch.delay"=>600, "pipeline.max_inflight"=>6000, "pipeline.sources"=>["/usr/share/logstash/config/vdcapp-ecs-pAGENTXInflightStep.conf"], :thread=>"#<Thread:0x35f26990 run>"}
[INFO ] 2023-04-10 07:42:30.691 [[pecsALB]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>2.27}
[INFO ] 2023-04-10 07:42:30.862 [[pecsAGENTXHealthManager]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>2.49}
[INFO ] 2023-04-10 07:42:31.024 [[pecsAGENTXHealthManager]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pecsAGENTXHealthManager"}
[INFO ] 2023-04-10 07:42:31.112 [[pecsALB]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pecsALB"}
[INFO ] 2023-04-10 07:42:31.806 [[pecsASRCErrors]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>4.14}
[INFO ] 2023-04-10 07:42:31.875 [[pecsASRCErrors]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pecsASRCErrors"}
[INFO ] 2023-04-10 07:42:32.070 [[pecsAGENTX]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>3.82}
[INFO ] 2023-04-10 07:42:32.119 [[pecsAGENTXkafkamessenger]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>3.88}
[INFO ] 2023-04-10 07:42:32.133 [[pecsASPCVRXSummary]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>3.99}
[INFO ] 2023-04-10 07:42:32.143 [[pMainFluentD]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>5.59}
[INFO ] 2023-04-10 07:42:32.159 [[pecsAGENTX]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pecsAGENTX"}
[INFO ] 2023-04-10 07:42:32.299 [[pecsASPCVRXSummary]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pecsASPCVRXSummary"}
[INFO ] 2023-04-10 07:42:32.339 [[pecsAGENTXkafkamessenger]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pecsAGENTXkafkamessenger"}
[INFO ] 2023-04-10 07:42:33.013 [[pecsASRCK8S]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>4.89}
[INFO ] 2023-04-10 07:42:33.018 [[pecsAGENTXInflightStep]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>4.43}
[INFO ] 2023-04-10 07:42:33.097 [[pecsASRCK8S]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pecsASRCK8S"}
[INFO ] 2023-04-10 07:42:33.128 [[pecsAGENTXInflightStep]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pecsAGENTXInflightStep"}
[INFO ] 2023-04-10 07:42:33.243 [[pecsAGENTXK8S]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>5.09}
[INFO ] 2023-04-10 07:42:33.266 [[pecsAGENTXK8S]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pecsAGENTXK8S"}
[INFO ] 2023-04-10 07:42:33.560 [Converge PipelineAction::Create<pecsAGENTXErrors>] QueueUpgrade - No PQ version file found, upgrading to PQ v2.
[INFO ] 2023-04-10 07:42:33.572 [Converge PipelineAction::Create<pecsAGENTXErrors>] javapipeline - Pipeline `pecsAGENTXErrors` is configured with `pipeline.ecs_compatibility: disabled` setting. All plugins in this pipeline will default to `ecs_compatibility => disabled` unless explicitly configured otherwise.
[INFO ] 2023-04-10 07:42:34.051 [[pecsAGENTXErrors]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"pecsAGENTXErrors", "pipeline.workers"=>4, "pipeline.batch.size"=>1500, "pipeline.batch.delay"=>600, "pipeline.max_inflight"=>6000, "pipeline.sources"=>["/usr/share/logstash/config/vdcapp-ecs-pAgentxErrors.conf"], :thread=>"#<Thread:0x3ac424f8 run>"}
[INFO ] 2023-04-10 07:42:34.171 [[pMainFluentD]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pMainFluentD"}
[INFO ] 2023-04-10 07:42:34.195 [[pMainFluentD]<tcp] tcp - Starting tcp input listener {:address=>"0.0.0.0:24284", :ssl_enable=>true}
[INFO ] 2023-04-10 07:42:37.338 [[pecsAGENTXErrors]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>3.28}
[INFO ] 2023-04-10 07:42:37.352 [[pecsAGENTXErrors]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"pecsAGENTXErrors"}
[INFO ] 2023-04-10 07:42:37.405 [Agent thread] agent - Pipelines running {:count=>11, :running_pipelines=>[:pecsALB, :pecsAGENTXkafkamessenger, :pecsASPCVRXSummary, :pecsASRCErrors, :pecsAGENTXHealthManager, :pecsAGENTX, :pMainFluentD, :pecsASRCK8S, :pecsAGENTXInflightStep, :pecsAGENTXK8S, :pecsAGENTXErrors], :non_running_pipelines=>[]}

Note: Even in normal logstash setup without K8s setup, the queue is not draining with this option is enabled queue.drain: true

leandrojmp · April 10, 2023, 12:34pm

We have queues with same name mentioned in pipelines.yml, but project secrecy I have edited it purposefully and matched differently

This was misleading and makes it hard to try to help, if you need to anynomize anything, please keep the same anonymized value when it appears in multiple places.

We are using the default path for path.data and path.queue

Yeah, but this does not explain how you are mounting the persistent volume on your container, you need to share this.

You have a couple of warnings on this, basically this means that the filesystem for your persistent queues does not have enough space for all of your queues.

How much space does this filesystem have?

In your logstash.yml you have queue.max_bytes set to 4GB, so you will need at least something around 32 GB of space just for the queues.

Please provide evidence of this, configuration and logs.

Karthik_N · April 11, 2023, 11:55am

Please find the logstash statefulset yaml configuration used in K8s deployment

apiVersion: apps/v1

kind: StatefulSet

metadata:

  creationTimestamp: "2023-03-28T13:40:12Z"

  generation: 78

  name: logstash-fluentd-asdi

  namespace: logstash

  resourceVersion: "18931052"

spec:

  podManagementPolicy: Parallel

  replicas: 2

  revisionHistoryLimit: 0

  selector:

    matchLabels:

      app: logstash-fluentd-asdi

  serviceName: logstash-fluentd-asdi

  template:

    metadata:

      annotations:

        kubectl.kubernetes.io/restartedAt: "2023-03-31T16:55:10+02:00"

      creationTimestamp: null

      labels:

        app: logstash-fluentd-asdi

    spec:

      affinity:

        podAntiAffinity:

          requiredDuringSchedulingIgnoredDuringExecution:

          - labelSelector:

              matchExpressions:

              - key: app

                operator: In

                values:

                - logstash-fluentd-asdi

            topologyKey: kubernetes.io/hostname

      automountServiceAccountToken: true

      containers:

      - env:

        - name: LS_JAVA_OPTS

          value: -Xmx4g -Xms4g

        image: logstash:8.4.0

        imagePullPolicy: IfNotPresent

        livenessProbe:

          failureThreshold: 3

          initialDelaySeconds: 300

          periodSeconds: 10

          successThreshold: 1

          tcpSocket:

            port: fluentd

          timeoutSeconds: 5

        name: logstash-fluentd

        ports:

        - containerPort: 24284

          name: fluentd

          protocol: TCP

        - containerPort: 5044

          name: filbeat

          protocol: TCP

        - containerPort: 9600

          name: default

          protocol: TCP

        readinessProbe:

          failureThreshold: 3

          initialDelaySeconds: 60

          periodSeconds: 10

          successThreshold: 3

          tcpSocket:

            port: fluentd

          timeoutSeconds: 5

        resources:

          limits:

            cpu: "2"

            memory: 8Gi

          requests:

            cpu: 400m

            memory: 1500Mi

        securityContext:

          allowPrivilegeEscalation: true

          capabilities:

            drop:

            - ALL

          privileged: false

          readOnlyRootFilesystem: false

          runAsNonRoot: true

          runAsUser: 1000

        terminationMessagePath: /dev/termination-log

        terminationMessagePolicy: File

        volumeMounts:

        - mountPath: /usr/share/logstash/config

          mountPropagation: None

          name: efs-storage

          readOnly: true

          subPath: asdi-pipelines

        - mountPath: /usr/share/logstash/config/cert

          mountPropagation: None

          name: efs-storage

          readOnly: true

          subPath: cert

        - mountPath: /usr/share/logstash/data

          mountPropagation: None

          name: ebs-data-storage

      dnsPolicy: ClusterFirst

      enableServiceLinks: true

      restartPolicy: Always

      schedulerName: default-scheduler

      securityContext:

        fsGroup: 1000

        runAsNonRoot: false

        runAsUser: 1000

      shareProcessNamespace: false

      terminationGracePeriodSeconds: 120

      volumes:

      - name: efs-storage

        persistentVolumeClaim:

          claimName: efs-claim

  updateStrategy:

    type: RollingUpdate

  volumeClaimTemplates:

  - apiVersion: v1

    kind: PersistentVolumeClaim

    metadata:

      creationTimestamp: null

      name: ebs-data-storage

      namespace: default

    spec:

      accessModes:

      - ReadWriteOnce

      resources:

        requests:

          storage: 6Gi

      storageClassName: ebs-sc

      volumeMode: Filesystem

    status:

      phase: Pending

You have a couple of warnings on this, basically this means that the filesystem for your persistent queues does not have enough space for all of your queues.

Thanks for pointing out, we will provide the number based on queue count

Please provide evidence of this, configuration and logs.

These logstash logs are without K8s setup

Step 1: Enabled the queue.drain: true option in logstash.yml
=================================================
path.data: /data/logstash
pipeline.workers: 50
pipeline.batch.size: 1500
pipeline.batch.delay: 600
config.reload.automatic: true
queue.type : persisted
queue.max_bytes : 6gb
queue.drain: true
dead_letter_queue.enable: true
dead_letter_queue.max_bytes: 4gb
path.logs: /var/log/logstash
#
# ------------ Other Settings --------------
#
# Where to find custom plugins
# path.plugins: []
#
# ------------ X-Pack Settings (not applicable for OSS build)--------------
#
# X-Pack Monitoring
# https://www.elastic.co/guide/en/logstash/current/monitoring-logstash.html
xpack.monitoring.enabled: true

Step 2: You can see the queue size before stopping the logstash and after stopping the logstash even the queue is not drained, please refer the screenshot for your evidence
=======================================================
![image|690x424](upload://egY8sB1aHodG7RyebZnTqD5jZMl.png)
![logstash_queue_status|690x280](upload://r96ZQJszkmFIFReVWV9UqLgoVqY.jpeg)


Step 3: Stopped the logstash, and verified the output logs, please refer the logstash logs after stopping it
=============================================================
[2023-04-11T12:13:53,542][WARN ][logstash.runner          ] SIGTERM received. Shutting down.
[2023-04-11T12:13:54,827][INFO ][logstash.javapipeline    ][pMoogsoft] Pipeline terminated {"pipeline.id"=>"pMoogsoft"}
[2023-04-11T12:13:55,608][INFO ][logstash.javapipeline    ][pCompassXPMapping] Pipeline terminated {"pipeline.id"=>"pCompassXPMapping"}
[2023-04-11T12:13:56,503][INFO ][logstash.javapipeline    ][pHttp] Pipeline terminated {"pipeline.id"=>"pHttp"}
[2023-04-11T12:13:57,870][INFO ][logstash.javapipeline    ][pMainFilebeat] Pipeline terminated {"pipeline.id"=>"pMainFilebeat"}
[2023-04-11T12:13:57,896][INFO ][logstash.javapipeline    ][pCompass] Pipeline terminated {"pipeline.id"=>"pCompass"}
[2023-04-11T12:13:58,150][INFO ][logstash.javapipeline    ][pMainFluentD] Pipeline terminated {"pipeline.id"=>"pMainFluentD"}
[2023-04-11T12:13:58,910][INFO ][logstash.javapipeline    ][pecsASPCMain] Pipeline terminated {"pipeline.id"=>"pecsASPCMain"}
[2023-04-11T12:13:59,911][INFO ][logstash.javapipeline    ][pecsASPCJson] Pipeline terminated {"pipeline.id"=>"pecsASPCJson"}
[2023-04-11T12:14:00,533][INFO ][logstash.javapipeline    ][pExtendedVehicleAppLog] Pipeline terminated {"pipeline.id"=>"pExtendedVehicleAppLog"}
[2023-04-11T12:14:00,625][INFO ][logstash.javapipeline    ][pecsAGENTXK8S] Pipeline terminated {"pipeline.id"=>"pecsAGENTXK8S"}
[2023-04-11T12:14:00,645][INFO ][logstash.javapipeline    ][pecsASRCK8S] Pipeline terminated {"pipeline.id"=>"pecsASRCK8S"}
[2023-04-11T12:14:00,645][INFO ][logstash.javapipeline    ][pecsASPCErrors] Pipeline terminated {"pipeline.id"=>"pecsASPCErrors"}
[2023-04-11T12:14:00,646][INFO ][logstash.javapipeline    ][pecsVDSK8S] Pipeline terminated {"pipeline.id"=>"pecsVDSK8S"}
[2023-04-11T12:14:00,900][INFO ][logstash.javapipeline    ][pecsASPCBackgroundScheduler] Pipeline terminated {"pipeline.id"=>"pecsASPCBackgroundScheduler"}
[2023-04-11T12:14:00,905][INFO ][logstash.javapipeline    ][pExtendedVehicleSummary] Pipeline terminated {"pipeline.id"=>"pExtendedVehicleSummary"}
[2023-04-11T12:14:00,910][INFO ][logstash.javapipeline    ][pecsMessengerService] Pipeline terminated {"pipeline.id"=>"pecsMessengerService"}
[2023-04-11T12:14:01,503][INFO ][logstash.javapipeline    ][pCommonservicesK8s] Pipeline terminated {"pipeline.id"=>"pCommonservicesK8s"}
[2023-04-11T12:14:01,608][INFO ][logstash.javapipeline    ][pecsASPCInflightStep] Pipeline terminated {"pipeline.id"=>"pecsASPCInflightStep"}
[2023-04-11T12:14:01,615][INFO ][logstash.javapipeline    ][pecsVDPK8S] Pipeline terminated {"pipeline.id"=>"pecsVDPK8S"}
[2023-04-11T12:14:01,615][INFO ][logstash.javapipeline    ][pecsASDIMainK8S] Pipeline terminated {"pipeline.id"=>"pecsASDIMainK8S"}
[2023-04-11T12:14:01,617][INFO ][logstash.javapipeline    ][pecsASPCLoadBalancer] Pipeline terminated {"pipeline.id"=>"pecsASPCLoadBalancer"}
[2023-04-11T12:14:01,895][INFO ][logstash.javapipeline    ][pecsVDPFKTWDAService] Pipeline terminated {"pipeline.id"=>"pecsVDPFKTWDAService"}
[2023-04-11T12:14:02,605][INFO ][logstash.javapipeline    ][pCloudedcbridge] Pipeline terminated {"pipeline.id"=>"pCloudedcbridge"}
[2023-04-11T12:14:02,608][INFO ][logstash.javapipeline    ][pecsVDPFESTD] Pipeline terminated {"pipeline.id"=>"pecsVDPFESTD"}
[2023-04-11T12:14:02,895][INFO ][logstash.javapipeline    ][pecsMGW] Pipeline terminated {"pipeline.id"=>"pecsMGW"}
[2023-04-11T12:14:02,898][INFO ][logstash.javapipeline    ][pecsAGENTX] Pipeline terminated {"pipeline.id"=>"pecsAGENTX"}
[2023-04-11T12:14:02,901][INFO ][logstash.javapipeline    ][pecsVDPTELE2CVDS] Pipeline terminated {"pipeline.id"=>"pecsVDPTELE2CVDS"}
[2023-04-11T12:14:02,904][INFO ][logstash.javapipeline    ][pecsALB] Pipeline terminated {"pipeline.id"=>"pecsALB"}
[2023-04-11T12:14:03,143][INFO ][logstash.javapipeline    ][pecsVDPDDA] Pipeline terminated {"pipeline.id"=>"pecsVDPDDA"}
[2023-04-11T12:14:03,502][INFO ][logstash.javapipeline    ][pecsAGENTXInflightStep] Pipeline terminated {"pipeline.id"=>"pecsAGENTXInflightStep"}
[2023-04-11T12:14:03,505][INFO ][logstash.javapipeline    ][pecsAGENTXHealthManager] Pipeline terminated {"pipeline.id"=>"pecsAGENTXHealthManager"}
[2023-04-11T12:14:03,604][INFO ][logstash.javapipeline    ][pecsASRCErrors] Pipeline terminated {"pipeline.id"=>"pecsASRCErrors"}
[2023-04-11T12:14:03,607][INFO ][logstash.javapipeline    ][pecsASDIConfiguration] Pipeline terminated {"pipeline.id"=>"pecsASDIConfiguration"}
[2023-04-11T12:14:03,608][INFO ][logstash.javapipeline    ][pecsAGENTXkafkamessenger] Pipeline terminated {"pipeline.id"=>"pecsAGENTXkafkamessenger"}
[2023-04-11T12:14:03,613][INFO ][logstash.javapipeline    ][pecsASDISummary] Pipeline terminated {"pipeline.id"=>"pecsASDISummary"}
[2023-04-11T12:14:03,614][INFO ][logstash.javapipeline    ][pecsAGENTXErrors] Pipeline terminated {"pipeline.id"=>"pecsAGENTXErrors"}
[2023-04-11T12:14:03,616][INFO ][logstash.javapipeline    ][pecsASPCVRXSummary] Pipeline terminated {"pipeline.id"=>"pecsASPCVRXSummary"}
[2023-04-11T12:14:03,897][INFO ][logstash.javapipeline    ][pecsASDIRecommendUpdates] Pipeline terminated {"pipeline.id"=>"pecsASDIRecommendUpdates"}
[2023-04-11T12:14:04,503][INFO ][logstash.javapipeline    ][pecsVDSComponents] Pipeline terminated {"pipeline.id"=>"pecsVDSComponents"}
[2023-04-11T12:14:04,739][INFO ][logstash.runner          ] Logstash shut down.

leandrojmp · April 11, 2023, 12:36pm

Karthik_N:

  volumeClaimTemplates:

  - apiVersion: v1

    kind: PersistentVolumeClaim

    metadata:

      creationTimestamp: null

      name: ebs-data-storage

      namespace: default

    spec:

      accessModes:

      - ReadWriteOnce

      resources:

        requests:

          storage: 6Gi

      storageClassName: ebs-sc

I do not use Kubernetes, but if I'm not wrong the persistent volume for your logstash data directory only has 6 GB but in your logstash.yml you set the queue.max_bytes to 4GB.

The queue.max_bytes applies to each queue, so if you have 2 pipelines in pipelines.yml you will have 2 queues, so you will need at least 8 GB, in the pipelines.yml you shared you have 8 pipelines, so you will have 8 queues which will need at least 32 GB of disk space, which you do not seem to be giving to your pod.

I'm not sure how Logstash will behave in this situation, but you can't expect it to behave correctly since it does not have enough space in the queue path.

Karthik_N · April 11, 2023, 12:44pm

Please refer my previous post, I have provided 3 steps with answers that we tested in normal server without K8s

leandrojmp · April 11, 2023, 12:49pm

I can't see the image you shared because it is in a code block, please edit your post.

But I see no evidence in the logs that would indicate any issues, your pipelines were all terminated without any issue, there is no WARN or ERROR logs indicating any inflight events.

Your issue with k8s is probably related to space, you need at least 32 GB from what you shared, the persistent volume has only 6 GB.

Also, have you checked the queues or just the directory size? You can check the queues using the pqcheck utility as described in the documentation.

Karthik_N · April 11, 2023, 4:41pm

Step 2: You can see the queue size before stopping the logstash and after stopping the logstash even the queue is not drained, please refer the screenshot for your evidence, this is tried without K8s setup

leandrojmp · April 11, 2023, 5:11pm

You will need to check each queue with the pqcheck mentioned in the previous answer.

Please, check each queue and share the result as plain text, not as images.

But just one thing, your original issue is related to your k8s cluster, and from whaty ou shared your persistent volume does not have enough space for all your pipelines, I suggest that you do not mix things up and solve this first and see if the behavior is still the same.

Also, this last log that you shared shows almost 40 pipelines, which would need something near 240 GB just for the queues, the /data has this space, right?

Karthik_N · May 9, 2023, 12:21pm

Sorry for the delay

Please refer below size of the queue before and after starting the logstash, the size was remains same, its not draining

root@shared-int-elk-logstash2:/data/logstash# du -sh *
2.6G dead_letter_queue
1.3G queue
4.0K uuid
root@shared-int-elk-logstash2:/data/logstash# service logstash stop
root@shared-int-elk-logstash2:/data/logstash# du -sh *
2.6G dead_letter_queue
1.3G queue
4.0K uuid
root@shared-int-elk-logstash2:/data/logstash# service logstash status
● logstash.service - logstash
Loaded: loaded (/etc/systemd/system/logstash.service; disabled; vendor pre>
Active: inactive (dead)

May 09 12:08:03 shared-int-elk-logstash2 logstash[568943]: [2023-05-09T12:08:03>
May 09 12:08:04 shared-int-elk-logstash2 logstash[568943]: [2023-05-09T12:08:04>
May 09 12:08:04 shared-int-elk-logstash2 logstash[568943]: [2023-05-09T12:08:04>
May 09 12:08:04 shared-int-elk-logstash2 logstash[568943]: [2023-05-09T12:08:04>
May 09 12:08:04 shared-int-elk-logstash2 logstash[568943]: [2023-05-09T12:08:04>
May 09 12:08:04 shared-int-elk-logstash2 logstash[568943]: [2023-05-09T12:08:04>
May 09 12:08:04 shared-int-elk-logstash2 logstash[568943]: [2023-05-09T12:08:04>
May 09 12:08:05 shared-int-elk-logstash2 logstash[568943]: [2023-05-09T12:08:05>
May 09 12:08:06 shared-int-elk-logstash2 systemd[1]: logstash.service: Succeede>
May 09 12:08:06 shared-int-elk-logstash2 systemd[1]: Stopped logstash.

root@shared-int-elk-logstash2:/data/logstash# du -sh *
2.6G dead_letter_queue
1.3G queue
4.0K uuid
root@shared-int-elk-logstash2:/data/logstash# ^C
root@shared-int-elk-logstash2:/data/logstash# service logstash start
root@shared-int-elk-logstash2:/data/logstash# du -sh *
2.6G dead_letter_queue
1.3G queue
4.0K uuid
root@shared-int-elk-logstash2:/data/logstash#

Also, this last log that you shared shows almost 40 pipelines, which would need something near 240 GB just for the queues, the /data has this space, right?
<YES>

system · June 6, 2023, 12:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Persistent Queues: logstash could not start after forced shutdown Logstash	1	1078	April 11, 2017
Logstash - unexpected behaviour with persistent queues Logstash	4	1037	February 14, 2019
Logstash service keep on restarting Logstash	13	1058	February 1, 2023
Logstash queue is not draining Logstash	1	606	January 6, 2020
Must delete files in message queue to start logstash Logstash	19	15000	March 15, 2017

Queue.drain: true not working for logstash as K8s setup

Step 2: You can see the queue size before stopping the logstash and after stopping the logstash even the queue is not drained, please refer the screenshot for your evidence, this is tried without K8s setup

Related topics