BulkProcessor code not working with refresh_interval as 5s

jsbonline2006 · November 29, 2013, 5:09pm

Hi All,

We are using Bulk Processor to index the data.
Initial indexing with refresh_interval as -1 worked fine for us.
However when we changed it to 5s it started throwing the following error.

2013-11-29 16:50:12,142 DEBUG IndexJSONData:148 - Error executing bulk :
org.elasticsearch.transport.RemoteTransportException: Failure in response -
failure in bulk execution:
[1]: index [test_data], type [data], id [eppPLplcommonHC684ZM/A], message
[UnavailableShardsException[[test_data][3] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@441377e]]
[3]: index [test_data], type [data], id [eppPLplcommonHC686ZM/A], message
[UnavailableShardsException[[test_data][5] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@5414bf84]]
[4]: index [test_data], type [data], id [eppPLplcommonHC687ZM/A], message
[UnavailableShardsException[[test_data][6] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@365821d3]]
[5]: index [test_data], type [data], id [eppPLplcommonHC688ZM/A], message
[UnavailableShardsException[[test_data][7] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@6e062232]]
[6]: index [test_data], type [data], id [eppPLplcommonHC689ZM/A], message
[UnavailableShardsException[[test_data][8] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@32d619b1]]
[8]: index [test_data], type [data], id [eppPLplcommonHC691ZM/A], message
[UnavailableShardsException[[test_data][3] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@441377e]]
[10]: index [test_data], type [data], id [eppPLplcommonHC693ZM/A], message
[UnavailableShardsException[[test_data][5] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@5414bf84]]
2013-11-29 16:50:12,142 DEBUG IndexJSONData:120 - Number of requests in
Bulk batch = 11
2013-11-29 16:50:12,142 DEBUG IndexJSONData:123 - START TIME: for Batch
74849 2013/11/29 16:50:12.142

8 out of 10 shards are
down. {"ok":true,"_shards":{"total":10,"successful":2,"failed":0}

We want to index the incremental data with 5s or 30s as refresh_interval
time. Please let us know if we can still use the BulkProcessor to index the
data or we need to use any other API to index it.

It will be very helpful if you could point out the exact Java API.

Also if we are missing anything on index setting then please let us know.

Thanks and Regards,
Jayesh Bhoyar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b745fe5-1237-4866-b1d5-407fac392e52%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · November 30, 2013, 10:54am

You should use monitoring and check if all nodes behave as they should. The
log information you gave is not enough to get a clear picture, they just
note the client had to wait for more than a minute for a response which did
not appear.

If you enable refresh during bulk, you put additional load on the indexing,
it will be invoked more often. After a while, depending on the segment
merge activity, nodes may get into a busy state and may not be reachable,
for instance if they were not configured for your workload. This is not
related to BulkProcessor, it is common to all indexing activity. Bulk
processing reveals this situation more often.

It can be that your nodes are too few, or too small to handle the
additional load, the I/O subsystem (disk, file system) of the nodes may be
too slow, or a server has problems to keep pace etc.

To streamline bulk processing, you should configure the BulkProcessor
properly, the number of concurrent bulk requests and the length of a bulk
request are critical.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEHoXJ%3DoDLVV3BkwAcnJZWmsA19rGK_3mzjskdrHx6K7A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.