Bulkload performance issue

hjj1000 · August 17, 2019, 1:15pm

Now we used JAVA to bulk load documents to Elasticsearch. We planned to import 10m documents each document size is almost 8M. Now we only can import 400K documents each day/ 5 documents every second. Our ES infrastructure is 3 master node with 4G ES_JAVA_OPTS(heap size) 2 data nodes and 2 client nodes with 2G memory. When I want to increase the speed of bulkload, we will get over heap size issue. Any advise for the improvement?
The disk I/O of the node is below. we set up the es cluster on Kubernetes.
dd if=/dev/zero of=/data/tmp/test1.img bs=1G count=10 oflag=dsync
10737418240 bytes (11 GB) copied, 50.7528 s, 212 MB/s

dd if=/dev/zero of=/data/tmp/test2.img bs=512 count=100000 oflag=dsync
51200000 bytes (51 MB) copied, 336.107 s, 152 kB/s

    for (int x =0; x<200000;x++) {
        BulkRequest bulkRequest = new BulkRequest();
        for (int k = 0; k < 50; k++) {
            Order order = generateOrder();
            IndexRequest indexRequest = new IndexRequest("orderpot", "orderpot");
            Object esDataMap = objectToMap(order);
            String source = JSONObject.valueToString(esDataMap);
            indexRequest.source(source, XContentType.JSON);
            bulkRequest.add(indexRequest);
        }
        rhlclient.bulk(bulkRequest, RequestOptions.DEFAULT);

Christian_Dahlqvist · August 17, 2019, 2:43pm

Indexing performance will depend on hardware as well as size and complexity of your documents, and 8MB is massive. Why are they so large? How are you going to query these huge documents?

Disk speed can also play a part, but I am not sure to what extent that is the case here as I have never indexed massive documents like that.

Given the size of your documents and volume, I would not be surprised if you needed a more heap on your data nodes.

system · September 14, 2019, 2:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Looking for advice on bulk loading Elasticsearch	6	941	July 6, 2017
Bulk loading performance slow & varies in ES 2.1.0 Elasticsearch	1	637	July 5, 2017
Questions --- Regarding to Size of Bulk Load., Capasity of a Shards and Performance Elasticsearch	3	386	July 6, 2017
Loading 500 GB of data / 3500000000 documents to ES cluster Elasticsearch	13	2168	November 4, 2022
ES 1.7.1 bulk inserts problems Elasticsearch	2	1202	July 5, 2017

Bulkload performance issue

Related topics