Import data from DB causes OutOfMemory

Hi

i am trying to import data from a postgreSQL DB however it never completes the imports becuase logstash runs out of Memory. If I limit the data in my select statement the data imports successfully.

Current JVM Arg -Xmx4000m -Xms256m -Xss2048k

The Data is 5G (7 475 233 records).

Logstash Config
INPUT:

input {
        beats {
                host => "0.0.0.0"
                port => 10514
                tags => "syslog_index"
        }
        beats {
                host => "0.0.0.0"
                port => 10516
                tags => "metricbeat"
        }
        jdbc {
            # Postgres jdbc connection string to our database
            jdbc_connection_string => "jdbc:postgresql://server:5432/Database"
            # The user we wish to execute our statement as
            jdbc_user => "user"
            jdbc_password => "password"
            # The path to our downloaded jdbc driver
            jdbc_driver_library => "/opt/logstash/postgresql-9.4-1204.jdbc41.jar"
            # The name of the driver class for Postgresql
            jdbc_driver_class => "org.postgresql.Driver"
            last_run_metadata_path => "/opt/logstash/logstash_jdbc_last_run"
            # our query
            statement => "SELECT * FROM table"
            tags => "test_index"
        }
}

OUTPUT:

output {
  if "test_index" in [tags] {
        elasticsearch {
      hosts => ["ES_SERVER:9200"]
      index => "test_index"
      document_type => "table"
      document_id => "%{table_id}"
      sniffing => false
      timeout => "480"
        }
  }
}

The trouble here is that the input plugin is attempting to load all 7MM+ documents all at once, which as you have indicated is ~5GB, larger than your configured ~4GB maximum memory to allocate to the JVM running Logstash.

You can window your query using the sql_last_value predefined parameter, so that the input will run multiple times, each time returning a new "page" that can fit into memory.

Its use depends a bit on the structure of your table. If you have an integer, auto-incrementing key id, something like this could work:

input {
  jdbc {
    # ...
    statement => "SELECT * FROM table WHERE id > :sql_last_value ORDER BY id ASC"
    use_column_value => true
    tracking_column => "id"
  }
  # ...
}

thanks for the tip, however logstash ran OutOfMemory after a bit. Do I also need to set page size/num of row as part of :sql_last_value?

d'oh. yes. the whole point was to limit the page size and I failed to include that.

So something more like:

input {
  jdbc {
    # ...
    statement => "SELECT * FROM table WHERE id > :sql_last_value ORDER BY id ASC LIMIT 10000"
    use_column_value => true
    tracking_column => "id"
  }
  # ...
}

Thanks that worked perfectly :grinning:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.