Initial Upload ElasticSearch 6.3 Bulk insert slows to a crawl

I'm trying to upload about 7 million documents to ES 6.3 and I've been running into and issue where the bulk upload slows to a crawl at about 1 million docs (I have no documents previous to this in the index).

I have a 3 node ES setup with 16GB with 8GB JVM settings, 1 index, 5 shards.
I have turned off refresh ("-1"), set replica to 0, increased the index buffer size to 30%.

On my upload side I have 22 threads running 150 docs per request of bulk insert.

For all of my nodes and upload machines the CPU, Memory, SSD Disk IO is low.

I've been able to get about 30k-40k inserts per/minute, but that seems really slow to me since others have been able to do 2k-3k per/sec. My documents do have nested json, but they don't seem to be very large to me (Is there way to check a single size doc or average?).

I would like to be able to bulk upload these documents in less than 12 - 24hrs and seems like ES should handle that, but once I get to 1 million it seems like it slows to a crawl.

I'm pretty new to ES so any help would be appreciated. I know this seems like question that has already been asked, but I've tried just about everything that I could find and wonder why my upload speed is a factor slower.

I've also checked the logs and only saw some errors about mapping field couldn't change, but nothing about memory over or anything like that.

ES 6.3 is great, but I'm also finding that the API has changed a bunch to 6 and settings that people were using are no longer supported.

I think I found a bottleneck at the active connections to my original database and increased that connection pool which helped.

I also tried an experiment on a big machine, that is used to run the upload job, running 80 threads at 1000 document uploads each. I did some calculations and found out that my documents are about 7-10k per document so doing uploads of 7-10MBs each bulk index. This got to the document count faster to 1M, but once you get there everything slows to a crawl. The machines stats are still really low. I do see output of the threads about every 5 mins or so on the logs for the job, about the same time I see the ES count change.

The ES machines still have low CPU, Memory. The IO is around 3.85MBs and the Network Bandwidth was at 55MBs and drops to about 20MBs.

I was able to find out that my issue had to do with my database where the original data was coming from. I was using LIMIT and OFFSET and moved to the NO-OFFSET was of walking through the data and the performance increased significantly.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.