Fastest way to import billions of documents?

jeroen1 · April 5, 2017, 2:50pm

Thanks for your quick reply!

I forgot to mention: I've tested using 12 and 24 shards (1 per CPU core, in and excluding HyperThreading) so expecting 1200-2400% load. Mappings, analyzers and filters are already highly optimized / stripped down.

Two questions based on your reply:

You're mentioning 10 or 20 MByte per _bulk request. Is there some way I can monitor what will work best for my specific setup? Perhaps some scripts / tools that can perform a benchmark for optimal values?
You're mentioning that multiple _bulks in parallel might be a or the way to go. Any idea why the full potential of an import is limited when using a single _bulk process? I guess that importing huge datasets in ElasticSearch is not uncommon to I expect it to be (quite) optimized.

Thanks for your thoughts!

Topic		Replies	Views
Speeding up indexing a very large file Elasticsearch	6	2873	May 30, 2017
Single node, large database index performance Elasticsearch	9	591	June 23, 2021
Bulkload performance issue Elasticsearch	2	380	September 14, 2019
Slow large document insertion Elasticsearch	2	417	July 6, 2017
Importing Big Data in Elasticsearch Elasticsearch	2	1509	July 5, 2017

Fastest way to import billions of documents?

Related topics