Split Document into multiple document based on condition

Hi there,

i plan to split field into multiple document based on OID. So, i have configured snmp pipeline for my network device. I managed to monitor the cpu and memory usage but i'm facing problem when i want to monitor network traffic on this device. in my pipeline configuration, i already add config like below:

walk => ["1.3.6.1.2.1.2.2.1.2", "1.3.6.1.2.1.2.2.1.10", "1.3.6.1.2.1.2.2.1.16"]

1.3.6.1.2.1.2.2.1.2 is OID for interface name
1.3.6.1.2.1.2.2.1.10 is OID for inbound traffic of the interface
1.3.6.1.2.1.2.2.1.16 is OID for outbound traffic of the interface

After i apply configuration above, the log at elastic look like this:


image
image
from the picture above, is it possible if I group the logs by Child OID ? my goal is group the field that have same child into different document. for example:

Field iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifDescr.1 will be in the same document as iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifInOctets.1 and iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable .ifEntry.ifOutOctets.1

but will be in different document with iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifDescr.10

so that way, I can display it in grafana. Can you help me please? Thank you

if you have another idea, please don't hesitate to give it to me. FYI, there are 3 device that i want to monitor

I think you would have to use a ruby filter. You could start with something like

    ruby {
        code => '
            a = []
            event.to_hash.each { |k, v|
                k.match(/sysOREntry\.(\w+)\.(\d+)$/) { |m|
                    i = m[2].to_i
                    a[i] ||= {}
                    a[i][m[1]] = v
                }
            }
            event.set("data", a)
        '
    }

which will convert a large set of fields like

  "system.sysORTable.sysOREntry.sysORDescr.7" => "The MIB module for managing UDP implementations",
 "system.sysORTable.sysOREntry.sysORUpTime.1" => 4,
 "system.sysORTable.sysOREntry.sysORUpTime.6" => 4,
     "system.sysORTable.sysOREntry.sysORID.2" => "1.3.6.1.6.3.15.2.1.1",

into

                                       "data" => [
    [ 0] nil,
    [ 1] {
        "sysORUpTime" => 4,
            "sysORID" => "1.3.6.1.6.3.11.3.1.1",
         "sysORDescr" => "The MIB for Message Processing and Dispatching."
    },
    [ 2] {
        "sysORUpTime" => 4,
            "sysORID" => "1.3.6.1.6.3.15.2.1.1",
         "sysORDescr" => "The management information definitions for the SNMP User-based Security Model."
    },
    [ 3] {
....

You will likely need to add error handling and additional filtering.

OK, in advance I'm very grateful to you because now the data is in accordance with what I want. but this ruby filter, causes my logstash to get OOM and it's quite often.

can this ruby ​​filter still be tuned? So that way, my logstash won't get OOM again. Thank you

I cannot imagine why that ruby filter would cause an OOM error. See this thread for how to deal with that.

I already apply filter like you mentioned before, but it's just drop the log if it has _rubyexception in tags. it doesn't solve OOM error. after i apply that filter, i can still see error log like this:

maybe for your information, My network devices that will be monitored increase to 64 and some of them have 52 interfaces. and my logstash has a 12 gigabyte jvm. do you think i must increase the spec of the logstash? or is there anything else I can do in this pipeline?

Thank you

Nobody on the internet can solve that problem for you. You need to run the analysis. If you can provide the output of a heap dump analyzer then it is possible someone my be able to help you review it.

from Memory analyzer tool (leak suspect) i got these 3 suspect
image
image
image

All the suspects refer to ruby. and this is some details from suspect 1. suspect 3 have a similar information like suspect 1

And this is details from suspect 2




image

Thank you

Here is if you want the file

If you approve the request for access that I submitted then I will take a look at the dump. The MAT output does not have enough detail to diagnose the problem.

sorry, can you request it again to this link? i didn't receive a request access to this folder. i already upload MAT report and the heap dump file there. Thank you

I already update the permission. Technically, you can download it without request access. Thank you

Pada tanggal Min, 23 Okt 2022 00.36, Badger via Discuss the Elastic Stack <notifications@elastic.discoursemail.com> menulis:

Yeah, but in the post you deleted there was a different link which required me to request access. I downloaded it and MAT got an error trying to parse it. Is there any chance you could reproduce the problem with a smaller heap size? A 12 GB heap dump may be more than I can handle.

Maybe i will send you a MAT report of the heap dump file later. If you have any instruction or tutorial how to download it. Please share it to me or you can try to change the -Xmx configuration on MemoryAnalyzer.ini file inside mat folder. if your computer or laptop has 16GB of RAM, you can change -Xmx configuration to 11g just like i did before to generate MAT report

Pada tanggal Min, 23 Okt 2022 07.50, Badger via Discuss the Elastic Stack <notifications@elastic.discoursemail.com> menulis:

Or you can try this link. I got this from heaphero team. I already contact them before to help me to analyzing my heap dump file and they just sent the report today

I didn't really want to do that since it resulted in the JVM growing the memory usage to 15.9 GB and swapping every other program to disk, so that they take 30+ seconds to respond.

That said, it worked and many of those programs will get swapped back into RAM overnight. I'll be back with an analysis in about 14 hours.

MAT only needs about 4 GB to store the analysis, so that's no problem. I was amazed by the level of compression that zip achieved on the heap dump. 150 MB to contain a 12 GB dump. Probably lots and lots of zeroes.

Ok, thank you @Badger . Alternatively, you can use the link that i just share to you from heaphero team

There are two worker threads, each of which has hundreds of millions of references to the same nil object. Each reference retains 40 bytes on the heap.

clip0

clip1

I have absolutely no clue as to how the ruby filter I suggested could cause that.

is it possible if this is caused due to too much data and insufficient memory? because of the network devices that I monitor, some of them have up to 52 interfaces and I also monitor the inbound and outbound traffic from each interface. so there's a lot of data being sent to elastic and maybe logstash is overwhelmed with organizing that data

I don't see it. The threads have accumulated around half a billion nil objects. I have no idea how that could have happened.

ok, thanks for your help