Split Document into multiple document based on condition

yuswanul · October 17, 2022, 7:30am

Hi there,

i plan to split field into multiple document based on OID. So, i have configured snmp pipeline for my network device. I managed to monitor the cpu and memory usage but i'm facing problem when i want to monitor network traffic on this device. in my pipeline configuration, i already add config like below:

walk => ["1.3.6.1.2.1.2.2.1.2", "1.3.6.1.2.1.2.2.1.10", "1.3.6.1.2.1.2.2.1.16"]

1.3.6.1.2.1.2.2.1.2 is OID for interface name
1.3.6.1.2.1.2.2.1.10 is OID for inbound traffic of the interface
1.3.6.1.2.1.2.2.1.16 is OID for outbound traffic of the interface

After i apply configuration above, the log at elastic look like this:

from the picture above, is it possible if I group the logs by Child OID ? my goal is group the field that have same child into different document. for example:

Field iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifDescr.1 will be in the same document as iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifInOctets.1 and iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable .ifEntry.ifOutOctets.1

but will be in different document with iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifDescr.10

so that way, I can display it in grafana. Can you help me please? Thank you

if you have another idea, please don't hesitate to give it to me. FYI, there are 3 device that i want to monitor

Badger · October 17, 2022, 4:59pm

I think you would have to use a ruby filter. You could start with something like

    ruby {
        code => '
            a = []
            event.to_hash.each { |k, v|
                k.match(/sysOREntry\.(\w+)\.(\d+)$/) { |m|
                    i = m[2].to_i
                    a[i] ||= {}
                    a[i][m[1]] = v
                }
            }
            event.set("data", a)
        '
    }

which will convert a large set of fields like

  "system.sysORTable.sysOREntry.sysORDescr.7" => "The MIB module for managing UDP implementations",
 "system.sysORTable.sysOREntry.sysORUpTime.1" => 4,
 "system.sysORTable.sysOREntry.sysORUpTime.6" => 4,
     "system.sysORTable.sysOREntry.sysORID.2" => "1.3.6.1.6.3.15.2.1.1",

into

                                       "data" => [
    [ 0] nil,
    [ 1] {
        "sysORUpTime" => 4,
            "sysORID" => "1.3.6.1.6.3.11.3.1.1",
         "sysORDescr" => "The MIB for Message Processing and Dispatching."
    },
    [ 2] {
        "sysORUpTime" => 4,
            "sysORID" => "1.3.6.1.6.3.15.2.1.1",
         "sysORDescr" => "The management information definitions for the SNMP User-based Security Model."
    },
    [ 3] {
....

You will likely need to add error handling and additional filtering.

yuswanul · October 20, 2022, 8:37am

OK, in advance I'm very grateful to you because now the data is in accordance with what I want. but this ruby filter, causes my logstash to get OOM and it's quite often.

can this ruby filter still be tuned? So that way, my logstash won't get OOM again. Thank you

Badger · October 20, 2022, 4:05pm

I cannot imagine why that ruby filter would cause an OOM error. See this thread for how to deal with that.

yuswanul · October 22, 2022, 2:38am

I already apply filter like you mentioned before, but it's just drop the log if it has _rubyexception in tags. it doesn't solve OOM error. after i apply that filter, i can still see error log like this:

maybe for your information, My network devices that will be monitored increase to 64 and some of them have 52 interfaces. and my logstash has a 12 gigabyte jvm. do you think i must increase the spec of the logstash? or is there anything else I can do in this pipeline?

Thank you

Badger · October 22, 2022, 3:55am

Nobody on the internet can solve that problem for you. You need to run the analysis. If you can provide the output of a heap dump analyzer then it is possible someone my be able to help you review it.

yuswanul · October 22, 2022, 9:55am

from Memory analyzer tool (leak suspect) i got these 3 suspect

All the suspects refer to ruby. and this is some details from suspect 1. suspect 3 have a similar information like suspect 1

And this is details from suspect 2

Thank you

yuswanul · October 22, 2022, 1:10pm

Here is if you want the file

Badger · October 22, 2022, 5:26pm

If you approve the request for access that I submitted then I will take a look at the dump. The MAT output does not have enough detail to diagnose the problem.

yuswanul · October 22, 2022, 11:56pm

sorry, can you request it again to this link? i didn't receive a request access to this folder. i already upload MAT report and the heap dump file there. Thank you

yuswanul · October 23, 2022, 12:17am

I already update the permission. Technically, you can download it without request access. Thank you

Pada tanggal Min, 23 Okt 2022 00.36, Badger via Discuss the Elastic Stack <notifications@elastic.discoursemail.com> menulis:

Badger · October 23, 2022, 12:40am

Yeah, but in the post you deleted there was a different link which required me to request access. I downloaded it and MAT got an error trying to parse it. Is there any chance you could reproduce the problem with a smaller heap size? A 12 GB heap dump may be more than I can handle.

yuswanul · October 23, 2022, 2:23am

Maybe i will send you a MAT report of the heap dump file later. If you have any instruction or tutorial how to download it. Please share it to me or you can try to change the -Xmx configuration on MemoryAnalyzer.ini file inside mat folder. if your computer or laptop has 16GB of RAM, you can change -Xmx configuration to 11g just like i did before to generate MAT report

Pada tanggal Min, 23 Okt 2022 07.50, Badger via Discuss the Elastic Stack <notifications@elastic.discoursemail.com> menulis:

yuswanul · October 23, 2022, 2:29am

Or you can try this link. I got this from heaphero team. I already contact them before to help me to analyzing my heap dump file and they just sent the report today

Badger · October 23, 2022, 2:54am

I didn't really want to do that since it resulted in the JVM growing the memory usage to 15.9 GB and swapping every other program to disk, so that they take 30+ seconds to respond.

That said, it worked and many of those programs will get swapped back into RAM overnight. I'll be back with an analysis in about 14 hours.

MAT only needs about 4 GB to store the analysis, so that's no problem. I was amazed by the level of compression that zip achieved on the heap dump. 150 MB to contain a 12 GB dump. Probably lots and lots of zeroes.

yuswanul · October 23, 2022, 3:09am

Ok, thank you @Badger . Alternatively, you can use the link that i just share to you from heaphero team

Badger · October 23, 2022, 6:18pm

There are two worker threads, each of which has hundreds of millions of references to the same nil object. Each reference retains 40 bytes on the heap.

clip0

I have absolutely no clue as to how the ruby filter I suggested could cause that.

yuswanul · October 25, 2022, 4:13am

is it possible if this is caused due to too much data and insufficient memory? because of the network devices that I monitor, some of them have up to 52 interfaces and I also monitor the inbound and outbound traffic from each interface. so there's a lot of data being sent to elastic and maybe logstash is overwhelmed with organizing that data

Badger · October 25, 2022, 4:23am

I don't see it. The threads have accumulated around half a billion nil objects. I have no idea how that could have happened.

yuswanul · October 25, 2022, 6:54am

ok, thanks for your help

Topic		Replies	Views
Logstash Filter - Breaking up fields into multiple documents Logstash	5	2352	June 8, 2020
Logstash split xml fields Logstash	22	5449	July 14, 2017
Splitting fields into new documents Logstash	3	479	November 9, 2020
Best practise method to split events and perform filtering to those events Logstash	2	496	December 27, 2016
Logstash filter not able to split Logstash	4	998	July 6, 2017

Split Document into multiple document based on condition

Related topics