Graphing last entry of the month

(Paul) #1

I have a security product with a vendor provided ELK stack included. One of the things I'm trying to get my head around is trend dashboards over a broad period of time.

Security reports are run daily but there could be re-runs and missing runs. Each run produces a single elasticsearch record per device and there's a risk score within this data.

Initially I'm just looking to get an aggregated view of risk on a monthly basis but given the duplicate and/or missing data, a simple sum of risk across the month is going to significantly skew the visualisation making it meaningless.

The question is how to best approach this with elasticsearch/kibana. I think I need some form of aggregate query to only return the last entry in the month for each device but I can't yet get my head around how to do this in ES. My thought is with this query, handling different graphing types should be relatively straightforward.

Any pointers on how to solve this or background reading that would point me in the right direction. As a new ELK user, I'm also wondering whether there's a more ELK-centric way of thinking about the problem?

(Tim Sullivan) #2

Sounds like the Elasticsearch data you have today is what I'd call "dirty data," and isn't in a good shape yet for consuming in Kibana. Kibana as a visualization tool tries to keep things simple on the data selection side, and trusts that the source data in Elasticsearch is more or less correct. The filters that are exposed in the Kibana toolset are provided for being able to look at data in different ways, not to "clean up" the data itself.

My advice is to introduce a process into the ingestion workflow that runs periodic searches on the dirty data, cleans it up with advanced querying or uses script tools to do so, and then re-ingests the good data into an index that uses a different prefix in the name.

If the duplicated data has something that can be collapsed on, like a tracking ID, you can look at field collapsing in Elasticsearch:

Gaps in the data aren't actually a big deal to worry about. When you're aggregating the data into buckets, there are some options on how to tell Elasticsearch what to do about the gaps. See:

I think the gap controls are configurable when building Kibana visualizations, but if you have more questions about that, let me know.

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.