My Elastic Stack Observability Wishlist

(Note, these are my personal wishes, not my workplace's.)

Since we started a trial at my workplace, I've been digging into everything the Elastic Stack has to offer. Mostly focused on Observability.

Here are some ideas I haven't, yet, seen a way to do in the stack as it is now (7.13). (I could very well just haven't found the right docs or screen, feel free to point me in the right direction. :slight_smile: )

Alerting

Alerts should have their own screen

Managing alerts is a HUGE part of why we are looking at the Elastic stack. Alert fatigue is a problem. So being able to easily manage alerts is essential. Right now, well, the UI for alerts in Kibana feels disorganized and confusing. I think it's goal is to make it easy to add alerts, and I think it does do that, but working with alerts after they happen is not so nice.

Acknowledgement

How do you acknowledge an alert? So, I get an email saying that server Frost is at 100% disk usage. I need a way to jump in and say "Hey, I saw this, no need to keep emailing me."

The best I can find is buried in Stack Management -> Rules and Connectors -> < the alert >. You can mute some of the hosts there.

What I'd really rather have is a screen listing all hosts and active alerts. Then for each host I could set each alert to a useful status.

Escalation

How do you escalate an alert?

Here's what I mean by that:

If an alert is not acknowledged for a period of time, the system should automatically start trying to contact other people. Based on a configured policy.

If a support person sees an alert, and knows it should go to another level of support, they should be able to click a button to have the system start telling the right people about it.

That kind of thing.

Observability -> Metrics

Let me see the entire hostname, please.

If you're not in the table view, and you have more than a few hosts, you cannot see more than a few characters of the hostname. Heck, sometimes the actual value for the load or usage is truncated. It would be really nice to be able to tell it that it should show the entire hostname.

Color code table rows

Just color code the table rows the same way the cards are color coded in the other view.

Let me see multiple metrics

In the temperature/card view, I'd love to be able to see cpu and ram usage at the same time.

Add filesystem metrics

I was able to add the 'system.filesystem.used.pct' metric, but I don't see any way to filter that so it only uses data from specific mount points.

Observability -> Uptime

Let me set the default sort

Specifically, I want to set the overview table to show down items first.

Observability -> Logs

Let me easily filter out logs

Case in point is Fortinet logs. I really could use a way to quickly filter out that specific event.dataset. I mean, I can easily filter FOR it via the log details screen.... At least I think that's what the "View event with filter" button does.

Let me define a grok pattern

Ok, this would likely take a lot of work, but man would it be useful. (I think...)

When I see a log pattern, for example Drupal logs in syslog, I'd like to be able to select several entries, then build out a grok pattern for them that breaks the data out into useful fields. There'd be a live test based on the log entries I selected, and then when I hit apply, any future logs would get processed properly.

Let me define a "drop" processor

Similar to above, but instead of grok, we'd just be telling the system that any logs matching our criteria can be dropped.


Anyway, those are some of the things I've been thinking of as I've worked with Observability. Hopefully there are a few good ideas in there. :slight_smile:

1 Like

Hey David, this is awesome! I don't work directly on Observability, so I think someone from that team will be able to provide better feedback on most of your items. However, one item stood out to me:

Let me define a "drop" processor

Similar to above, but instead of grok, we'd just be telling the system that any logs matching our criteria can be dropped.

In Stack Management there's an Ingest Node Pipelines UI which allows you to define and edit pipelines for processing ingested documents. There's a neat debug feature so you can test it out with sample docs and ensure it does what you want. One of the processors you can choose is the drop processor which seems to behave the way you describe. Does this give you what you're looking for? If not can you help me understand what you need and what's missing? Thanks again!

I did run across the interface. It is pretty close to what I'm looking for, I think. I haven't actually made it work yet, though. So thanks for pointing me to the correct docs.

Maybe my idea would be for something of a shortcut, or wizard, to that UI? Something that fills things out most of the way for you?

So, I've split my postfix logs out into some fields, but now a lot of the log lines in the logs stream are blank... It'd be awesome if we could say "show this field in the stream".

Hi @jerrac,

this is an awesome list. Thanks so much for putting it together. It's very valuable input for our planning discussions.

So, I've split my postfix logs out into some fields, but now a lot of the log lines in the logs stream are blank... It'd be awesome if we could say "show this field in the stream".

The Logs UI stream assumes an ECS compliant mapping, so it primarily tries to display the message field. Granted, there are some heuristics to pull in data from other fields, but I'd recommend to avoid these. In addition to the message field you should be able to add any other field as a column to the Logs UI stream via its settings page.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.