Filtering based on change in values

For several years I've logged data from some devices in CSV files, and used perl or grep to analyse them in various ways.

I'm trying to do the same thing in elasticsearch, primarily to benefit from far faster retrieval than grepping tens of gigs of data.

I'm polling and recording increasing numbers every 10 seconds. Data looks like this
|time|service|hostname|received|recovered|lost|
|20:00:00|Bob|host130|45|3|3|
|20:00:10|Bob|host130|167|6|8|
|20:00:20|Bob|host130|289|8|12|
|20:00:30|Bob|host130|412|8|12|
|20:00:40|Bob|host130|516|8|12|
|20:00:50|Bob|host130|678|8|12|
|20:01:00|Bob|host130|711|12|16|
|20:01:10|Bob|host130|734|12|20|
|20:01:20|Bob|host130|789|12|20|

Sometimes the counters reset to zero, and the "service" changes
|20:00:00|Joan|host131|51|3|3|
|20:00:10|Joan|host131|51|6|8|
|20:00:20|Joan|host131|235|8|12|
|20:00:30|Joan|host131|371|8|12|
|20:00:40|Joan|host131|414|8|12|
|20:00:50|Dave|host131|0|0|0|
|20:01:00|Dave|host131|150|1|6|
|20:01:10|Dave|host131|301|1|6|
|20:01:20|Dave|host131|460|4|13|

I can graph the delta using a serial diff, but what I'd really like is a data table telling me when the value changes, so above would list something like

|20:00:00|Bob|host130|45|3|3|
|20:00:00|Joan|host131|51|3|3|
|20:00:10|Bob|host130|167|6|8|
|20:00:10|Joan|host131|51|6|8|
|20:00:20|Bob|host130|289|8|12|
|20:00:20|Joan|host131|235|8|12|
|20:00:50|Dave|host131|0|0|0|
|20:01:00|Bob|host130|711|12|16|
|20:01:00|Dave|host131|150|1|6|
|20:01:10|Bob|host130|734|12|20|
|20:01:20|Dave|host131|460|4|13|

My poller does not retain the "last" value stored, so can't do the delta itself, and ideally I'd like the poller to avoid storing state.

Perhaps you could use "Top hits" (?)

Your pivot would be the hostname, and you could select on the last-value for this.

I'm not quite sure what you mean with "when last value changes" though. In that result list, the Dave-host131 tuple starts to appear, but I'm not sure why some of the rows are dropped. Is it based on the recovered-lost columns somehow?

Yes, the service being received switched from "Joan" to "Dave" at 20:00:45, so the counters reset. I'm interested in that, at least sometimes.

Basically I'd like to do something like:

For each line

  1. Select the previous line which matches (host=host131)
  2. Compare that line's "lost" column with this lines "lost" column
  3. If they are the same, filter out that line

hmmm...


High level approach:

I would create a script-field (https://www.elastic.co/guide/en/kibana/current/scripted-fields.html) that concatenates the service-hostname-lost fields in a single value. That will create an entity that "changes" if the service and/or lost field changes.

The definition of that field would be something like:

doc['servicel'].value + '| ' + doc['host'].value + '|' + doc['lost'].value

(ymmv, didn't test this syntax on real-data).

Then, do a term-aggregation on this script-field, and use top-hits to keep values of the "first" document (you're not interested in later documents if the "lost" field doesn't change).


To create that table, in the "Data-table" visualization in Kibana

  • select that new "script field" with the concatenation as your term in in the Bucket-configuration (split rows > Terms). That's the "pivot" of your entity.
  • Then, for "Metrics" , create a top-hits metric for each of the "time", "service", "hostname", "received", "recovered", and "lost" fields.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.