No node available while doing easticsearch data migration


(spancer ray) #1

Hi Guys,

I'm using java api to migrate volume index data from one es cluster to another, and my code is somewhat like below:

Migration code:

int pageSize = 10000;
String[] indices = { "myoldindex" };
Client sourceclient = ClientUtil.getSourceTransportClient();
SearchResponse searchResponse = sourceclient.prepareSearch(indices).setSearchType(SearchType.SCAN)
.setQuery(matchAllQuery()).setSize(pageSize).setScroll(TimeValue.timeValueMinutes(2)).execute().actionGet();
BulkRequestBuilder bulkRequestBuilder = ClientUtil.getTargetTransportClient().prepareBulk();
while (true)
{
searchResponse = sourceclient.prepareSearchScroll(searchResponse.getScrollId())
.setScroll(TimeValue.timeValueMinutes(2)).execute().actionGet();
for (SearchHit hit : searchResponse.getHits())
{
bulkRequestBuilder.add(Requests.indexRequest("myindex").type("myindexType").id(hit.getSource().get("_id").toString())
.source(hit.getSource()));
}
if (bulkRequestBuilder.numberOfActions() > 0)
{
bulkRequestBuilder.execute().actionGet();
System.out.println(bulkRequestBuilder.numberOfActions());
bulkRequestBuilder = ClientUtil.getTargetTransportClient().prepareBulk();
}

		if (searchResponse.getHits().hits().length == 0)
		{
			break;
		}
	}
            if (bulkRequestBuilder.numberOfActions() > 0)
		bulkRequestBuilder.execute().actionGet();

Client code:

static Settings defaultSettings = ImmutableSettings.settingsBuilder().put("client.transport.sniff", true).build();

  private static TransportClient targetClient;

private static TransportClient sourceClient;

static {
    try {
        Class<?> clazz = Class.forName(TransportClient.class.getName());
        Constructor<?> constructor = clazz.getDeclaredConstructor(Settings.class);
        constructor.setAccessible(true);
	    Settings finalSettings = ImmutableSettings.settingsBuilder()
	                .put(defaultSettings)
	                .build();
	    targetClient = (TransportClient) constructor.newInstance(finalSettings);
	    sourceClient = (TransportClient) constructor.newInstance(finalSettings);
	    sourceClient.addTransportAddress(new InetSocketTransportAddress("192.168.1.127", 9300));
	    targetClient.addTransportAddress(new InetSocketTransportAddress("192.168.1.128", 9300));
    } catch (Exception e) {
        e.printStackTrace();
    } 
}

//get instance
public static synchronized Client getSourceTransportClient() {
    return sourceClient;
}

  //get instance
public static synchronized Client getTargetTransportClient() {
    return targetClient;
}

At first, everything goes fine, and the speed is about 6 Million records per hour, however, after about 2 hours, I got the No node available exception.... I'm pretty sure it's not the complicating problem that cause it.
And I'm wondering whether es (or ES client) has some timeout configration params?

Can anyone help? Will be great appreciated.

Spancer


(Jörg Prante) #2

You have overwhelmed the cluster, so it can not respond within 5 seconds,
and did not streamline your bulk indexing. On cluster side, consider to
look into segment merging and how big your segments have grown.

In the indexing code, you do not take care of BulkResponses when just
executing bulkRequestBuilder.execute().actionGet(). This must sooner or
later go crazy. Check if you can add a listener and wait for the cluster to
respond properly before continuing.

Also, in the scan/scroll request, note that setSize() is per shard. Check
if pageSize * shard numbers is the right size for a bulk request, it may
get too large.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(spancer ray) #3

Hi Jörg,

Thanks for the reply. I changed to use BulkProcessor which enhanced the inserting performance.
However, while pulling data from source cluster, i got the same exception. This time, the problem
caused by searchScroll. Below is the exception message:

Exception in thread "main" org.elasticsearch.client.transport.NoNodeAvailableException: No node available at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:246) at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:214) at org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106) at org.elasticsearch.client.support.AbstractClient.searchScroll(AbstractClient.java:229) at org.elasticsearch.client.transport.TransportClient.searchScroll(TransportClient.java:410) at org.elasticsearch.action.search.SearchScrollRequestBuilder.doExecute(SearchScrollRequestBuilder.java:92) at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:62) at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:57)

And my fetch size was set to 2000, with 5 shards in my source cluster, which means my scroll fetches 10,000 every scan request. I don't know whether this happened for the overwhelmed fetching operation, if so, how can I avoid this? To recude my fetch size? Or should I increse it to so as to meet the insertion hunger. (I got my BulkProcessor concurrentRequests size set to 5, with other params the default values.)

I've got 70million test data in my source cluster, and the exception happens when after 25million data migrated. Have you any idea on this?

Thanks,
spancer


(Jörg Prante) #4

Do you use monitoring tools for watching the cluster nodes?

So you can find out how the resource usage is developing until you reach 25
mio. I predict you will notice the cluster entering big segment merge phase
plus the search load from your scan/scroll requests. Try to streamline
segment merging by either throttling or reducing the segment maximum size
to load (default is 5G).

You should try using a smaller value for setSize(), maybe 200 instead of
2000, to let the scan/scroll generate more handy bulk request sizes.

The life time for a scroll request is very high, 2 minutes. During this
time the server must keep found docs in memory and this can easily pile up.
I would reduce it to 30 seconds or so. This will save resources on the
cluster node, but it must be balanced with the setSize() param to avoid
search timeouts.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5