Display top value of counter rate of nested fields

Hello there,

I an in the midle of adding some SNMP monitoring data to our Elastic stack and I cannot find a good way to display the data in Kibana's dashboard.

I use logstash snmp input plugin and, thanks to this thread, I now have a lot of metrics for all the interfaces of my devices, now I am trying to filter and display them.

The _source field of my documents looks something like that :

    "host": {
      "name": "host398237a",
      "version": "15.2.x",
      "uptime": 248
    },
    "interfaces": {
      "Ethernet1/1": {
        "ifInErrors": 0,
        "ifOperStatus": 1,
        "ifAdminStatus": 1,
        "ifOutOctets": 654810,
        "ifInOctets": 69647,
        "ifOutErrors": 0
      },
      "Ethernet1/2": {
        "ifInErrors": 1489,
        "ifOperStatus": 1,
        "ifAdminStatus": 1,
        "ifOutOctets": 644940,
        "ifInOctets": 7090641,
        "ifOutErrors": 0
      }
    }

What I would like to have is a visualization that displays the "top evolving" value of Errors counters.

As you can see one of the interface has error on it (ifInErrors) ; I dont really care about the value here, what I want to know is if it has evolved since last document. Even better, I dont care if the value decrease (it would mean the counter would have been reset).

I almost certain "counter rate" is what I want here, but where I struggle is to get "top X counter rate accross all interfaces of all devices", I don't even know if it is something doable.

Right now I can have one Kibana lens per value to monitor :

  • lens1 : interfaces.Ethernet1/1.ifInErrors
  • lens2 : interfaces.Ethernet1/1.ifOutErrors
  • lens3 : interfaces.Ethernet1/2.ifInErrors
  • lens4 : interfaces.Ethernet1/2.ifOutErrors
  • ...

But you understand that not very realistic, that would mean having number_of_interfaces * number_of_metrics visualizations, which can easily reach the hundreds, if not thousands.

I think my issue here is ELK only allows me to work on the field "interfaces.Ethernet1/1.ifInErrors", where I want to work on something like "interfaces.*.ifInErrors".

I hope this was clear enough, any help is appreciated !

Regards

@Marius_Dragomir will TSVB help here? I think so?

Thanks,
Bhavya

If I understand this correctly, you're interested in which interface has the largest increase in iflnErrors between each document?

If that's what you need, the way the data is structured won't really help you since we can't do much logic based on field name (it's exactly like doing logic based on variable name in any programming language).

Normally you would have to split that one document into as many documents as interfaces you have, something like this:
1st doc

{    
"host": {
      "name": "host398237a",
      "version": "15.2.x",
      "uptime": 248
    },
    "interface": "Ethernet1/1",
        "ifInErrors": 0,
        "ifOperStatus": 1,
        "ifAdminStatus": 1,
        "ifOutOctets": 654810,
        "ifInOctets": 69647,
        "ifOutErrors": 0
}

2nd doc:

{
    "host": {
      "name": "host398237a",
      "version": "15.2.x",
      "uptime": 248
    },
      "interface": "Ethernet1/2",
        "ifInErrors": 1489,
        "ifOperStatus": 1,
        "ifAdminStatus": 1,
        "ifOutOctets": 644940,
        "ifInOctets": 7090641,
        "ifOutErrors": 0
    }

I think this can be doable from the logstash pipeline when ingesting via that plugin.

Hi,

thanks you guys for your answers.

you're interested in which interface has the largest increase in iflnErrors between each document

That is it, with a few perks :

  • having the "top interface" would be a good start but what would be awesome is the "top 10 interfaces ordered by speed of error counter increase"
  • the 10 interfaces from this list could all be part of the same "host" or spread across multiple hosts

I did not precise it in my first post but I dont mind changing the data structure of the document.
I did it this way because it was easily human readable, but humans will use dashboard anyway so...

I am not a big fan of splitting one document into many though, that is because right now every document have a lot of other fields (like location, customer, other metrics, hardware & firmware information etc...), used for various dashboard.

Splitting one document into 10, 50 or even 100, will add a lot of overhead I think, but maybe that is the way to go.

Your answers made me think of two solutions :

  1. having a dedicated index for this interface counter metrics (so we dont mess with the current index & index mapping), where, as you suggested, a document no longer represent a host but an interface.
  2. keep the existing index, and even the nested interface field, and add a new field at the root level containing the sum of all interfaces errors counter.

The first solution will keep our current index clean, but create a new one with a lot of overhead.

I am starting to think the second option would be the lesser evil, the new documents would look something like that :

"host": {
      "name": "host398237a",
      "version": "15.2.x",
      "uptime": 248
    },
    "globalErrorCounter": 1489,
    "interfaces": {
      "Ethernet1/1": {
        "ifInErrors": 0,
        "ifOperStatus": 1,
        "ifAdminStatus": 1,
        "ifOutOctets": 654810,
        "ifInOctets": 69647,
        "ifOutErrors": 0
      },
      "Ethernet1/2": {
        "ifInErrors": 1489,
        "ifOperStatus": 1,
        "ifAdminStatus": 1,
        "ifOutOctets": 644940,
        "ifInOctets": 7090641,
        "ifOutErrors": 0
      }
    }

With that we should be able to have a dashboard listing/displaying the "top 10 hosts ordered by globalErrorCounter increase speed", right ?

After that if we want to have a closer look, at the interface level, it can be done manually for now : the scope to look at will already have been greatly reduced (form thousands to dozens).

I would gladly hear your thoughts about all that :slight_smile:

Regards

The overhead won't be as big as you imagine, usually Elasticsearch is pretty good at parallelizing these operations.
Your first solution will increase disk footprint significantly, if that's not a problem, I would go with it.
The second one is good as well, if you don't need fine-grained data, it will just add a very small overhead it ingest time (still smaller than ingesting 2x the documents). You could prototype this by creating a scripted field or a runtime field that calculates this without having to change how your documents are right now.
This is just basic advice on performance and tuning, the Elasticsearch discuss form can help you go into more depth if you have concerns based on your ingest rate and ES configuration.