(Note, these are my personal wishes, not my workplace's.)
Since we started a trial at my workplace, I've been digging into everything the Elastic Stack has to offer. Mostly focused on Observability.
Here are some ideas I haven't, yet, seen a way to do in the stack as it is now (7.13). (I could very well just haven't found the right docs or screen, feel free to point me in the right direction. )
Alerting
Alerts should have their own screen
Managing alerts is a HUGE part of why we are looking at the Elastic stack. Alert fatigue is a problem. So being able to easily manage alerts is essential. Right now, well, the UI for alerts in Kibana feels disorganized and confusing. I think it's goal is to make it easy to add alerts, and I think it does do that, but working with alerts after they happen is not so nice.
Acknowledgement
How do you acknowledge an alert? So, I get an email saying that server Frost is at 100% disk usage. I need a way to jump in and say "Hey, I saw this, no need to keep emailing me."
The best I can find is buried in Stack Management -> Rules and Connectors -> < the alert >. You can mute some of the hosts there.
What I'd really rather have is a screen listing all hosts and active alerts. Then for each host I could set each alert to a useful status.
Escalation
How do you escalate an alert?
Here's what I mean by that:
If an alert is not acknowledged for a period of time, the system should automatically start trying to contact other people. Based on a configured policy.
If a support person sees an alert, and knows it should go to another level of support, they should be able to click a button to have the system start telling the right people about it.
That kind of thing.
Observability -> Metrics
Let me see the entire hostname, please.
If you're not in the table view, and you have more than a few hosts, you cannot see more than a few characters of the hostname. Heck, sometimes the actual value for the load or usage is truncated. It would be really nice to be able to tell it that it should show the entire hostname.
Color code table rows
Just color code the table rows the same way the cards are color coded in the other view.
Let me see multiple metrics
In the temperature/card view, I'd love to be able to see cpu and ram usage at the same time.
Add filesystem metrics
I was able to add the 'system.filesystem.used.pct' metric, but I don't see any way to filter that so it only uses data from specific mount points.
Observability -> Uptime
Let me set the default sort
Specifically, I want to set the overview table to show down items first.
Observability -> Logs
Let me easily filter out logs
Case in point is Fortinet logs. I really could use a way to quickly filter out that specific event.dataset. I mean, I can easily filter FOR it via the log details screen.... At least I think that's what the "View event with filter" button does.
Let me define a grok pattern
Ok, this would likely take a lot of work, but man would it be useful. (I think...)
When I see a log pattern, for example Drupal logs in syslog, I'd like to be able to select several entries, then build out a grok pattern for them that breaks the data out into useful fields. There'd be a live test based on the log entries I selected, and then when I hit apply, any future logs would get processed properly.
Let me define a "drop" processor
Similar to above, but instead of grok, we'd just be telling the system that any logs matching our criteria can be dropped.
Anyway, those are some of the things I've been thinking of as I've worked with Observability. Hopefully there are a few good ideas in there.