Initial Upload ElasticSearch 6.3 Bulk insert slows to a crawl

bach942 · August 10, 2018, 7:26pm

I'm trying to upload about 7 million documents to ES 6.3 and I've been running into and issue where the bulk upload slows to a crawl at about 1 million docs (I have no documents previous to this in the index).

I have a 3 node ES setup with 16GB with 8GB JVM settings, 1 index, 5 shards.
I have turned off refresh ("-1"), set replica to 0, increased the index buffer size to 30%.

On my upload side I have 22 threads running 150 docs per request of bulk insert.

For all of my nodes and upload machines the CPU, Memory, SSD Disk IO is low.

I've been able to get about 30k-40k inserts per/minute, but that seems really slow to me since others have been able to do 2k-3k per/sec. My documents do have nested json, but they don't seem to be very large to me (Is there way to check a single size doc or average?).

I would like to be able to bulk upload these documents in less than 12 - 24hrs and seems like ES should handle that, but once I get to 1 million it seems like it slows to a crawl.

I'm pretty new to ES so any help would be appreciated. I know this seems like question that has already been asked, but I've tried just about everything that I could find and wonder why my upload speed is a factor slower.

I've also checked the logs and only saw some errors about mapping field couldn't change, but nothing about memory over or anything like that.

ES 6.3 is great, but I'm also finding that the API has changed a bunch to 6 and settings that people were using are no longer supported.

bach942 · August 11, 2018, 2:47pm

I think I found a bottleneck at the active connections to my original database and increased that connection pool which helped.

I also tried an experiment on a big machine, that is used to run the upload job, running 80 threads at 1000 document uploads each. I did some calculations and found out that my documents are about 7-10k per document so doing uploads of 7-10MBs each bulk index. This got to the document count faster to 1M, but once you get there everything slows to a crawl. The machines stats are still really low. I do see output of the threads about every 5 mins or so on the logs for the job, about the same time I see the ES count change.

The ES machines still have low CPU, Memory. The IO is around 3.85MBs and the Network Bandwidth was at 55MBs and drops to about 20MBs.

bach942 · August 12, 2018, 4:05am

I was able to find out that my issue had to do with my database where the original data was coming from. I was using LIMIT and OFFSET and moved to the NO-OFFSET was of walking through the data and the performance increased significantly.

system · September 9, 2018, 4:05am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fast Bulk upload Elasticsearch	3	818	June 14, 2018
Elasticsearch bulk slows down after a certain amount of documents Elasticsearch	4	1471	April 24, 2020
Looking for advice on bulk loading Elasticsearch	6	937	July 6, 2017
Bulk loading performance slow & varies in ES 2.1.0 Elasticsearch	1	636	July 5, 2017
BulkProcessor Indexing Performance Elasticsearch	1	1174	November 1, 2017

Initial Upload ElasticSearch 6.3 Bulk insert slows to a crawl

Related topics