Stack Monitoring fails to load logstash pipelines

Hi

I had the same issue ("when I click on any pipeline, the wheel with "Loading" is just rolling and nothing happens") on our environment. Based on this error I ended up on this thread.
I was just able to fix the issue. It seems that MetricBeat is not able to pick up new deployed/changed pipelines after it is started. So restarting MetricBeat resolved my issue.

I had also similar Internal Server errors. They were caused by configuration issues in my Logstash pipeline. Once resolved, they disappeared.

logstash-plain-2021-09-13-21.log.gz:[2021-09-13T14:38:55,135][ERROR][logstash.agent           ] Internal API server error {:status=>500, :request_method=>"GET", :path_info=>"/_node/stats", :query_string=>"vertices=true", :http_version=>"HTTP/1.1", :http_accept=>nil, :error=>"Unexpected Internal Error", :class=>"LogStash::Instrument::MetricStore::MetricNotFound", :message=>"For path: events. Map keys: [:pipelines, :reloads]", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/instrument/metric_store.rb:241:in `block in get_recursively'", "org/jruby/RubyArray.java:1809:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/metric_store.rb:240:in `get_recursively'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/metric_store.rb:251:in `block in get_recursively'", "org/jruby/RubyArray.java:1809:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/metric_store.rb:240:in `get_recursively'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/metric_store.rb:111:in `block in get'", "org/jruby/ext/thread/Mutex.java:164:in `synchronize'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/metric_store.rb:110:in `get'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/metric_store.rb:124:in `get_shallow'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/metric_store.rb:173:in `block in extract_metrics'", "org/jruby/RubyArray.java:1809:in `each'", "org/jruby/RubyEnumerable.java:1126:in `inject'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/metric_store.rb:149:in `extract_metrics'", "/usr/share/logstash/logstash-core/lib/logstash/api/service.rb:45:in `extract_metrics'", "/usr/share/logstash/logstash-core/lib/logstash/api/commands/base.rb:37:in `extract_metrics'", "/usr/share/logstash/logstash-core/lib/logstash/api/commands/stats.rb:73:in `events'", "/usr/share/logstash/logstash-core/lib/logstash/api/modules/node_stats.rb:57:in `events_payload'", "/usr/share/logstash/logstash-core/lib/logstash/api/modules/node_stats.rb:37:in `block in GET /?:filter?'", "org/jruby/RubyMethod.java:115:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1675:in `block in compile!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1013:in `block in route!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1032:in `route_eval'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1013:in `block in route!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1061:in `block in process_route'", "org/jruby/RubyKernel.java:1189:in `catch'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1059:in `process_route'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1011:in `block in route!'", "org/jruby/RubyArray.java:1809:in `each'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1008:in `route!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1129:in `block in dispatch!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1101:in `block in invoke'", "org/jruby/RubyKernel.java:1189:in `catch'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1101:in `invoke'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1124:in `dispatch!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:939:in `block in call!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1101:in `block in invoke'", "org/jruby/RubyKernel.java:1189:in `catch'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1101:in `invoke'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:939:in `call!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:929:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-protection-2.1.0/lib/rack/protection/xss_header.rb:18:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-protection-2.1.0/lib/rack/protection/path_traversal.rb:16:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-protection-2.1.0/lib/rack/protection/json_csrf.rb:26:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-protection-2.1.0/lib/rack/protection/base.rb:50:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-protection-2.1.0/lib/rack/protection/base.rb:50:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-protection-2.1.0/lib/rack/protection/frame_options.rb:31:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-2.2.3/lib/rack/null_logger.rb:11:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-2.2.3/lib/rack/head.rb:12:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:216:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1991:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-2.2.3/lib/rack/urlmap.rb:74:in `block in call'", "org/jruby/RubyArray.java:1809:in `each'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-2.2.3/lib/rack/urlmap.rb:58:in `call'", "/usr/share/logstash/logstash-core/lib/logstash/api/rack_app.rb:74:in `call'", "/usr/share/logstash/logstash-core/lib/logstash/api/rack_app.rb:48:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rack-2.2.3/lib/rack/builder.rb:244:in `call'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/puma-4.3.8-java/lib/puma/server.rb:718:in `handle_request'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/puma-4.3.8-java/lib/puma/server.rb:472:in `process_client'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/puma-4.3.8-java/lib/puma/server.rb:328:in `block in run'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/puma-4.3.8-java/lib/puma/thread_pool.rb:134:in `block in spawn_thread'"]}

How to find out which pipeline could be causing this? Got something around 30 production pipelines, so finding an error there would be too much for me and these error logs don't tell me much where to start looking.

Btw., restarting Metricbeat doesn't help, tried that a couple times.

The "Internal API server error" errors are not clear indeed (in general they are not expected errors, so quite obvious elastic can't explain them with clear messages), so there is no indication what the issue is.
Other error messages are quite clear in which direction to look.

I do have a TST environment, where I can test my configs. Not sure if you can restart your PRD. My procedure is:

  • Restart Logstash
  • Look into the log messages for the (first) ERROR

If you could provide a grep for your "[ERROR]" logs as of your restart, I could maybe help you out. (If this is at least also causing your issue)

There are some KV exceptions, but those are only warnings and then there are a few Ruby exceptions that occurs over time, they are due to measuring log size and some logs do not include a field from which the size is calculated, but I'm not sure, if this is a problem, because they do not occur directly after restart. Along with "Internal API server error" comes only

[2021-09-20T06:02:35,790][ERROR][logstash.agent           ] API HTTP Request {:status=>500, :request_method=>"GET", :path_info=>"/_node/stats", :query_string=>"vertices=true", :http_version=>"HTTP/1.1", :http_accept=>nil}

This piece is the response after the Internal Server Error (HTTP Status Code = 500):

As per my experience (and I can only speak from my little experience), the other errors broke my pipeline. You could check if the field exists, before operating on it. I'm sure you can do something similar in ruby like I do in my filter:

if ![environment] {
        mutate{
                add_field => { "environment" => "unknown" }
        }
    }

Although I do agree that pipeline errors can't break monitoring!