Hi,
We have a logstash deployment where we use the Node Stats API to independently track logstash performance/status. Till 7.2.0, the API behaved as documented, but after update to 7.3.0, we are seeing inconsistent reporting from the API
-
We have multiple pipelines and till 7.2.0 all were reported. Now only one pipeline is being reported. The pipeline that gets reported back changes with each restart. I can confirm the other pipelines are running, because I can see log messages such as
Pipelines running {:count=>4, :running_pipelines=>[:"1_snortalerts", :"2_monitor", :".monitoring-logstash", :"0_main"], :non_running_pipelines=>[]}
The actual API response is like
$ curl -XGET 'http://localhost:9600/_node/stats/pipelines?pretty' { "host" : "logstash2", "version" : "7.3.0", "http_address" : "127.0.0.1:9600", "id" : "e1b4029a-0efe-4438-bcef-e7460cb5bbb3", "name" : "logstash2", "ephemeral_id" : "1f735de7-e389-44c0-b043-96c046dbb51c", "status" : "green", "snapshot" : false, "pipeline" : { "workers" : 7, "batch_size" : 2000, "batch_delay" : 50 }, "pipelines" : { "0_main" : { "events" : { "in" : 1640089, "queue_push_duration_in_millis" : 103425, "out" : 1638527, "filtered" : 1638527, "duration_in_millis" : 3123654 }, "plugins" : { "inputs" : [ .. ], "codecs" : [ .. ], "filters" : [ .. ], "outputs" : [ .. ] }, "reloads" : { "failures" : 0, "last_error" : null, "last_failure_timestamp" : null, "last_success_timestamp" : null, "successes" : 0 }, "queue" : { "type" : "persisted", "events_count" : 0, "queue_size_in_bytes" : 264287082, "max_queue_size_in_bytes" : 1073741824 }, "dead_letter_queue" : { "queue_size_in_bytes" : 2 }, "hash" : "f7e13af47e85c49bb8b0c9e095622b78d081f902f79b9cb992f1a510fd983ff5", "ephemeral_id" : "bf097cd2-c4ab-4fcf-9e57-7c5512ebb498" } }
}
-
Looking to retrieve stats for a particular pipeline using
/_node/stats/pipelines/PIPELINENAME?pretty
will return stats for only the pipeline which is being listed when using no pipeline name. Similarly, specifying non-existing pipeline in this command will return the same output (not an error orpipeline not found
message). For the above example, the output will be same for all the following queries/_node/stats/pipelines?pretty
/_node/stats/pipelines/1_notmain?pretty
/_node/stats/pipelines/foo_notpresent?pretty
-
The structure of queue reporting has also changed. Prior versions had a sub-key
capacity
wherein stats such asqueue_size_in_bytes
andmax_queue_size_in_bytes
were reported. Now these stats are directly reported underqueue
.
Is this a documented change or a bug?