Bulk import into ES


#1

I have 50 million records to import into ES.

What sort of configuration would I need for this, how many nodes, clusters, etc? I am using NEST API for C#.

How can I monitor how optimally the import is being done?


(Nik Everett) #2

It really depends on lots of things. Size of the documents, hardware, etc. You should try it. I'd start with 1 machine for that.

You can read the log for things like merges being throttled.

You can check the stack traces for things you don't expect.

Mostly, though, you should be testing different batch sizes for the bulk api and trying to find a good one for your data.

I've had lots of success tuning the refresh_interval interval on the index up to 30s or -1 (infinity) when doing bulk imports. That speeds them up a ton. See this for how to set it. If you set it to -1 you have to manually call _refresh after the process is done.


(Eugeniy) #3

hi, check these settings:
index.translog.flush_threshold_ops
index.translog.flush_threshold_size

while configurating nodes do not forget to check "bootstrap.mlockall: true" )))


(system) #4