At first, everything goes fine, and the speed is about 6 Million records per hour, however, after about 2 hours, I got the No node available exception.... I'm pretty sure it's not the complicating problem that cause it.
And I'm wondering whether es (or ES client) has some timeout configration params?
You have overwhelmed the cluster, so it can not respond within 5 seconds,
and did not streamline your bulk indexing. On cluster side, consider to
look into segment merging and how big your segments have grown.
In the indexing code, you do not take care of BulkResponses when just
executing bulkRequestBuilder.execute().actionGet(). This must sooner or
later go crazy. Check if you can add a listener and wait for the cluster to
respond properly before continuing.
Also, in the scan/scroll request, note that setSize() is per shard. Check
if pageSize * shard numbers is the right size for a bulk request, it may
get too large.
Thanks for the reply. I changed to use BulkProcessor which enhanced the inserting performance.
However, while pulling data from source cluster, i got the same exception. This time, the problem
caused by searchScroll. Below is the exception message:
Exception in thread "main" org.elasticsearch.client.transport.NoNodeAvailableException: No node available at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:246) at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:214) at org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106) at org.elasticsearch.client.support.AbstractClient.searchScroll(AbstractClient.java:229) at org.elasticsearch.client.transport.TransportClient.searchScroll(TransportClient.java:410) at org.elasticsearch.action.search.SearchScrollRequestBuilder.doExecute(SearchScrollRequestBuilder.java:92) at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:62) at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:57)
And my fetch size was set to 2000, with 5 shards in my source cluster, which means my scroll fetches 10,000 every scan request. I don't know whether this happened for the overwhelmed fetching operation, if so, how can I avoid this? To recude my fetch size? Or should I increse it to so as to meet the insertion hunger. (I got my BulkProcessor concurrentRequests size set to 5, with other params the default values.)
I've got 70million test data in my source cluster, and the exception happens when after 25million data migrated. Have you any idea on this?
Do you use monitoring tools for watching the cluster nodes?
So you can find out how the resource usage is developing until you reach 25
mio. I predict you will notice the cluster entering big segment merge phase
plus the search load from your scan/scroll requests. Try to streamline
segment merging by either throttling or reducing the segment maximum size
to load (default is 5G).
You should try using a smaller value for setSize(), maybe 200 instead of
2000, to let the scan/scroll generate more handy bulk request sizes.
The life time for a scroll request is very high, 2 minutes. During this
time the server must keep found docs in memory and this can easily pile up.
I would reduce it to 30 seconds or so. This will save resources on the
cluster node, but it must be balanced with the setSize() param to avoid
search timeouts.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.