3 questions about memory leak

Using LogStash 6.7.1
It really feels like the memory consumption increases day by day. So:

  1. Is really there a memory leak?
    I read in some of the "RELEASE NOTES" some comment about minimizing it, so I assume it's real. Has it been fixed completely in higher versions?

  2. My pipelines use, a lot, the filter plugin "ruby". Could it be related? IIRC, the memory leak was related to JRuby... Is JRuby involved only when the filter plugin "ruby" is being used?

  3. My setup is composed by 10 identical pipelines. Does that setup increases the effect of the memory leak? Would it be better if I reduce them to 5 pipelines, for example?

Thanks a lot.
Jose

Are you using the Monitoring functionality to track things?

not really. A simple "ps" commands is enough to see how memory usage grows every day.

Is heap usage increasing over time or is it total memory reported used by the process? Are you using the file input plugin or persistent queues?

1 Like

I am checking RSS.
Input plugin is "beats".

Are you only using beats inputs? Have you got persistent queues configured? Do you have monitoring installed?

Nope, I don't have persistent queues. I admit I didn't know about them until now :slight_smile:
No monitoring.

I came across a tool named: GCeasy's advanced machine learning algorithm saves time and hassle from dealing with cryptic CG logs. You can quickly detect memory leaks, long GC pauses, premature object promotions plus many other performance impacting problems.

unclear to me if any of the comments help me to answer my original questions...

Can you show your config or list the plugins you are using? If you have any plugins that rely on files being read the reported memory usage could be growing due to files being cached by the OS, which is not a memory leak. If that was the case i would expect reported memory use to go down once Logstash is restarted.

Sure. Here it comes. It is a mock, for privacy.

The key aspect is that I use the Ruby filter plugin, which keeps in memory several Hashes, where each key value is a combination of hostname and logfile for the source input data. That is a finite number, so at some point it should have created all possible values.

I also guess those Hashes are eventually recycled when the value of the key is repeated, as they are class variables, correct? Ruby is not keeping all of them in memory forever, right?

input {
    beats {
    }
}

# ========================================================================= 

filter {

    if [fields][log_type] == "foo" {

        mutate {
            split => ["source", "/"]
            add_field => { "logfile" => "%{[source][-1]}"  }
            add_field => { "origin" => "%{logfile}@%{[beat][hostname]}"  }
        }

        grok {
            pattern_definitions => { ... }
            match => { "message" => [
               ...
            ] }
            break_on_match => false
        }


        date {
            match => ["eventtimestamp", "MM/dd/yy HH:mm:ss.SSS"]
            target => "eventtimestamp"
        }

        ruby {

            init => '
                @var1 = Hash.new
                @var2 = Hash.new
                @var3 = Hash.new
                @var4 = Hash.new
                @var5 = Hash.new
                @var6 = Hash.new
                ...
            '

            code => '

                @origin = event.get("origin")

                if ....
                    @var1[@origin] = event.get("var1")
                elsif ...
                    @var2[@origin] = event.get("var2")
                elsif ...
                    @var3[@origin] = event.get("var3")
                elseif ...
                    event.set("var1", @var1[@origin])
                    event.set("var2", @var2[@origin])
                    event.set("var3", @var3[@origin])
                    ...
                else
                    event.cancel
                end
            '
        }

        mutate {
            add_field => { "executionhost" => "%{[host][name]}" }
        }


        prune {
            whitelist_names => ["...", "..."]
        }


    } else if [fields][log_type] == "bar" {
        #
        # similar workflow
        #
    }

}

# ========================================================================= 

output {
    lumberjack {
        codec => json
        hosts => "myremotehost"
        ssl_certificate => "/etc/logstash/certs/lumberjack.cert"
        port => 1234
    }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.