GC warnings and Time out exceptions on bulk update


(Spiral Circ) #1

Hi,

I am trying to bulk update about 2.3 million documents in batches of 100K.
After about 400K - 500K i start seeing the below exceptions on the client side. And i see lot of gc logging on the elastic search console (pasted below).

I am doing this in 2 steps. I get all the ids of the documents using the scroll api (which works fine), and after that do a bulk update in batches of 100K.

If anyone can point if I am doing anything wrong or if there is a better way to do this that would be great.

Client Bulk Update Code :

            while (true) {
                count++;
                String id = idQueue.take();    //BlockingQueue<String>, ids obtained using search/scroll api

                updateRequest = new UpdateRequest(index, type, id).script
                        ("ctx._source.TOTAL_COMPENSATION = Double.parseDouble(ctx._source.SALARY) + Double.parseDouble(ctx._source.BONUS)");
                bulkRequest.add(updateRequest);
                if (count % 100000 == 0) {
                    BulkResponse bulkItemResponses = bulkRequest.execute().actionGet();
                    System.out.println("UPDATED  " + count + " Took " + bulkItemResponses.getTook());
                    bulkRequest = transportClient.prepareBulk();
                }
            }

Client Log:

UPDATED + 500000 Took 2.3m
Sep 13, 2015 11:48:36 PM org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler doSample
INFO: [Jocasta] failed to get node info for [#transport#-1][localhost][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9300]][cluster:monitor/nodes/info] request_id [142] timed out after [5001ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Sep 13, 2015 11:48:38 PM org.elasticsearch.transport.TransportService$Adapter checkForTimeout
WARNING: [Jocasta] Received response for a request that has timed out, sent [7929ms] ago, timed out [2928ms] ago, action [cluster:monitor/nodes/info], node [[#transport#-1][localhost][inet[localhost/127.0.0.1:9300]]], id [142]
UPDATED + 600000 Took 2.5m
Sep 13, 2015 11:49:52 PM org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler doSample

ElasticSearch Console:

[2015-09-13 23:54:49,830][WARN ][monitor.jvm ] [Mad Jack] [gc][young][968][244] duration [2.2s], collections [1]/[3s], total [2.2s]/[5.1m], memory [9.1gb]->[8.8gb]/[11.9gb], all_pools {[young] [468.3mb]->[10.7mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] [8.6gb]->[8.7gb]/[11.3gb]}
^C[2015-09-13 23:55:17,172][WARN ][monitor.jvm ] [Mad Jack] [gc][young][986][245] duration [8.7s], collections [1]/[8.9s], total [8.7s]/[5.3m], memory [9.2gb]->[8.9gb]/[11.9gb], all_pools {[young] [489mb]->[944.6kb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] [8.7gb]->[8.8gb]/[11.3gb]}


(Magnus B├Ąck) #2

It looks like you're simply pushing data quicker than what your current cluster can handle. Have you tried reducing the batch size? I don't think 100k batches gives better performance than, say, 10k.


#3

i'm getting the same error, how can I check how much data my cluster can handle? Is it something I can modify the elastic search config file or filebeat.yaml?


(system) #4