Memory usage grows in (short-ish) time

We just recently started trying to use logstash. The problem is that memory usage grows, at least within first 2 days, to the point that we can't use it.

I tried reducing Xms and Xmx from default of 1G to 512M. restarted. Made sure that these options are there in java call (with ps). After couple of hours I'm back at 1.5GB of mem used.

We use logstash 7.15.0-1 on Ubuntu Bionic. After 18hours of runtime, with -Xms512m -Xmx512m, it is using now (RSS) 1723004 kB:

=$ ps uw -p 56563
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     56563  7.7  0.4 26823120 1723004 ?    SNsl Oct14  85:46 /usr/share/logstash/jdk/bin/java -Xms512m -Xmx512m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.compile.invoked

Immediately after restart is was ~ 350MB, but it grew very fast to 1.6GB, and then gradually to ~ 1.7 now.

I know that generally speaking it's not that much of memory, but simple processing of loglines using 1.7GB is simply strange, plus - we have some servers with like 8GB of ram, and usage of 1.5GB there is simply not an option.

Our config:

input {      
    file {
        "path" => "/cache/postgres_logs/*.csv"
        "tags" => [ "PostgreSQL" ]                     
        "sincedb_path" => "/cache/postgres_logs/sincedb_pgsql"
        codec => multiline {                
            auto_flush_interval => 2   
            pattern => "^%{TIMESTAMP_ISO8601}"
            what => previous                                                
            negate => true         
        }                            
    }                            
                                             
    file {   
        "path" => "/cache/pgbouncer_logs/pgbouncer-*.log"
        "tags" => [ "pgBouncerOnPg" ]
        "sincedb_path" => "/cache/pgbouncer_logs/sincedb_pgbouncer"
    }    
}             
                                                                                                                                            
filter {                           
    if "PostgreSQL" in [tags] { 
        csv {
            # Columns list:
            # https://www.postgresql.org/docs/12/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG
            columns => [ "log_time", "user_name", "database_name", "process_id", "connection_from", "session_id", "session_line_num", "command_tag", "session_start_time", "virtual_transaction_id", "transaction_id", "error_severity", "sql_state_code", "sql_message", "detail", "hint", "internal_query", "internal_query_pos", "context", "query", "query_pos", "location", "application_name" ]
            skip_empty_columns => true
            convert => {
                "process_id" => "integer"                    
                "session_line_num" => "integer"
                "transaction_id" => "integer"
                "session_start_time" => "date_time"
                "internal_query_pos" => "integer"
                "query_pos" => "integer"               
            }                                                                                         
        }                                              
        mutate {                                               
            gsub => [ "sql_message", "[\r\n\t ]+", " "]
            add_field => {
                "sourcetype" => "PostgreSQL"
                "region" => "us-east-1"
                "az" => "us-east-1b"
                "host_fqdn" => "..."              
                "pgrole" => "..."
                "project" => "..."
                "cluster" => "94"                                              
                "environment" => "production"
            }                      
        }                          
        date {                         
            match => ["log_time", "YYYY-MM-dd HH:mm:ss.SSS z"]
        }                                                          
        grok {           
            match => ["sql_message", "duration: %{DATA:duration:float} ms +(statement|(parse|bind|execute) [^:]+): %{GREEDYDATA:statement}"]
            break_on_match => false
            tag_on_failure => []
        }   
                
        # Add normalized versions of fields that can contain queries
        # \x22 is " character                  
        ruby {                         
            code => "               
                %w{ sql_message query statement }.each do |k|               
                    ov = event.get(k)
                    next unless ov   
                    v = ov.downcase
                    v.gsub!( %r{\s+}, ' ')   
                    v.gsub!( %r{\s*/\*.*?\*/\s*}, ' ' )
                    v.gsub!( %r{\x22?(...|cluster\d+_shard_\d+|cluster\d+)\x22?\.}, 'SOME_SHARD.' )
                    v.gsub!( %r{'[^']*'|\b\d+\b}, '?' )
                    v.gsub!( %r{ in \([\s,?]+\)}, ' in (...)' )                                                                   
                    event.set( 'n_' + k, v )
                end
            "                                                   
        }
              
        # Add each element from marginalia as "marginalia_ORIGINAL_NAME" => ORIGINAL_VALUE
        ruby {
            code => "
                if m = event.get('sql_message').match( %r{/\*(.*?)\*/[\s;]*$} )
                    m[1].
                        split(',').
                        map {|s| s.
                        split(':', 2)}.
                        each { |s|
                            event.set( 'marginalia_' + s[0], s[1] )             
                        }                                                          
                end
            "
        }
    } else {
        mutate {
            add_field => {                                          
                "sourcetype" => "pgBouncerOnPg"
                "region" => "us-east-1"
                "az" => "us-east-1b"
                "host_fqdn" => "..."
                "pgrole" => "..."  
                "project" => "..."
                "cluster" => "94"  
                "environment" => "production"
            }                                          
        }                                                                                             
        grok {                                         
            match => ["message", "^(?<log_ts>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d\d\d \S+) \[%{INT:pid:int}\] %{WORD:log_level}"]
        }                                   
        grok {     
            match => ["path", "-%{INT:pgbouncer_port:int}.log$"]
        }
        date {
            match => ["log_ts", "YYYY-MM-dd HH:mm:ss.SSS z"]                              
        }     
    }                
                                                                               
}                        
                                   
output {                           
    http {                             
        http_method => "post"     
        url => "...."
        headers => ['Authorization', '....']
    }              
}            

Is there anything we could do about it?

This sounds like it's off-heap memory, which is predominately managed by the OS and not Logstash.

It's likely being caused by the OS caching the files you are reading in. Are there lots of those?

This is DB server, so there are lots of files cached by OS. But why would OS cache show itself as RSS in logstash process?

This doesn't make much sense.

I looked at smaps for the logstash process. And, for example. On a server I looked I see:

=$ ps uww -C java
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     29932  1.2  1.6 13259576 2081660 ?    SNsl Oct15  44:14 /usr/share/logstash/jdk/bin/java -Xms512m -Xmx512m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.compile.invokedynamic=true -Djruby.jit.threshold=0 -Djruby.regexp.interruptible=true -XX:+HeapDumpOnOutOfMemoryError -Djava.security.egd=file:/dev/urandom -Dlog4j2.isThreadContextMapInheritable=true -cp /usr/share/logstash/logstash-core/lib/jars/animal-sniffer-annotations-1.14.jar:/usr/share/logstash/logstash-core/lib/jars/checker-compat-qual-2.0.0.jar:/usr/share/logstash/logstash-core/lib/jars/commons-codec-1.14.jar:/usr/share/logstash/logstash-core/lib/jars/commons-compiler-3.1.0.jar:/usr/share/logstash/logstash-core/lib/jars/commons-logging-1.2.jar:/usr/share/logstash/logstash-core/lib/jars/error_prone_annotations-2.1.3.jar:/usr/share/logstash/logstash-core/lib/jars/google-java-format-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/gradle-license-report-0.7.1.jar:/usr/share/logstash/logstash-core/lib/jars/guava-24.1.1-jre.jar:/usr/share/logstash/logstash-core/lib/jars/j2objc-annotations-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-annotations-2.9.10.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-core-2.9.10.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-databind-2.9.10.8.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-dataformat-cbor-2.9.10.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-dataformat-yaml-2.9.10.jar:/usr/share/logstash/logstash-core/lib/jars/janino-3.1.0.jar:/usr/share/logstash/logstash-core/lib/jars/javassist-3.26.0-GA.jar:/usr/share/logstash/logstash-core/lib/jars/jruby-complete-9.2.19.0.jar:/usr/share/logstash/logstash-core/lib/jars/jsr305-1.3.9.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-1.2-api-2.14.0.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-api-2.14.0.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-core-2.14.0.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-jcl-2.14.0.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-slf4j-impl-2.14.0.jar:/usr/share/logstash/logstash-core/lib/jars/logstash-core.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.commands-3.6.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.contenttype-3.4.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.expressions-3.4.300.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.filesystem-1.3.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.jobs-3.5.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.resources-3.7.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.runtime-3.7.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.app-1.3.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.common-3.6.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.preferences-3.4.1.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.registry-3.5.101.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.jdt.core-3.10.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.osgi-3.7.1.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.text-3.5.101.jar:/usr/share/logstash/logstash-core/lib/jars/reflections-0.9.11.jar:/usr/share/logstash/logstash-core/lib/jars/slf4j-api-1.7.30.jar:/usr/share/logstash/logstash-core/lib/jars/snakeyaml-1.23.jar org.logstash.Logstash --path.settings /etc/logstash

So, it's 2GB of RSS vs. 512MB of Xms/Xmx.

smaps for this looks like this:

=$ sudo cat /proc/29932/smaps_rollup 
e0000000-ffffd1025000 ---p 00000000 00:00 0                              [rollup]
Rss:             2093716 kB
Pss:             2091375 kB
Pss_Anon:        2074376 kB
Pss_File:          16999 kB
Pss_Shmem:             0 kB
Shared_Clean:       2424 kB
Shared_Dirty:          0 kB
Private_Clean:     16904 kB
Private_Dirty:   2074388 kB
Referenced:      2093716 kB
Anonymous:       2074376 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB

So it seems to be all private-dirty memory. Why?

How many of these logs files are they, how often are they being written to?

2 new files every hour. One for pgbouncer, one for postgresql.

pgbouncer generally writes one line every minute, switching to new log every hour.

postgresql - well, it's a bit more complex, but, logs from 3 days on one server, contain ~ 2.7 million rows, across 76 files (sometimes new file is generated earlier than after 1 hour.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.