Indexing 100 Million records from postgresdb

Kevin_Juliano · July 16, 2019, 3:17pm

I just want to know what specs and configuration do I need to index 100Million+ records, Right now I have free memory of:

total used free shared buff/cache available
Mem: 64410 2782 25389 13663 36237 47315

I was able to index couple of millions of records but on this one it has a huge amount of data, I have the heap space increased by 16GB.

I am using ubuntu 18.x

Here's my config:

# file: my.conf
input {
jdbc {
    jdbc_connection_string => "jdbc:postgresql://localhost/db?user=[USER]&password=[PASS]"
    jdbc_user => "postgres"
    jdbc_driver_library => ""
    jdbc_driver_class => "org.postgresql.Driver"
    statement => "SELECT a.app_id, a.text, CASE WHEN b.granted = 'f' THEN 0 ELSE 1 END as isgrant FROM tbl1 as a LEFT JOIN tbl2 as b ON a.app_id = b.app_id"
}
  }
  output {
  elasticsearch {
      index => "idx"
      document_id => "%{app_id}"
      hosts => "http://localhost:9200"
  }
}

Please help.

Badger · July 16, 2019, 4:33pm

There is no way to answer that question. A t2.micro single-CPU cloud instance with 500 MB of heap can feed a billion records to elasticsearch, although it will probably take longer than a larger instance (unless your elasticsearch instance is the bottleneck).

If you are not using any filters and just pulling data out of jdbc I would not expect you to need a massive amount of memory.

I suggest you experiment. Measure the number of documents that you can index per second on a given set of hardware. Decide if it is fast enough. If not, start figuring out whether the bottleneck is the database, logstash, or elasticsearch and tune that subsystem. Then measure again and repeat. Nobody else can do this for you.

Kevin_Juliano · July 16, 2019, 5:17pm

Thanks! this means more work for me

system · August 13, 2019, 5:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transform large file from postgresql to elasticsearch Logstash elastic-stack-sql	6	1268	January 15, 2020
Indexing stops after 1000 records Logstash	1	615	July 6, 2017
Indexing 570 millions rows Elasticsearch	4	3675	July 5, 2017
Logstash5.5.0 Out of memory issue Logstash	3	668	October 8, 2017
Logstash limitting ElasticSearch heap Elasticsearch	5	448	July 6, 2017

Indexing 100 Million records from postgresdb

Related topics