Initializing_shards - second db start up takes long time


#1

Hello all,
I am working on a solution that uses embedded elasticsearch server - on one local machine. The scenario is:
1)create cluster with one node. Import data - 3 million records in ~180 indexes and 911 shards. Data is available, search works and returns expected data:
{
"cluster_name" : "cn1441023806894",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 911,
"active_shards" : 911,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

Now, I shutdown the server - this is my console output:
sie 31, 2015 2:51:36 PM org.elasticsearch.node.internal.InternalNode stop
INFO: [testbg] stopping ...
sie 31, 2015 2:51:50 PM org.elasticsearch.node.internal.InternalNode stop
INFO: [testbg] stopped
sie 31, 2015 2:51:50 PM org.elasticsearch.node.internal.InternalNode close
INFO: [testbg] closing ...
sie 31, 2015 2:51:50 PM org.elasticsearch.node.internal.InternalNode close
INFO: [testbg] closed

The database folder is around 2.4 GB.

Now i start the server again.... and it takes around 10 minutes to reach status green, example health:
{
"cluster_name" : "cn1441023806894",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 68,
"active_shards" : 68,
"relocating_shards" : 0,
"initializing_shards" : 25,
"unassigned_shards" : 818
}

After that process, the database folder is ~0.8 GB.

Then I shutdown the database, and open it again, and now it gets green in 10 seconds.

My configuration:
settings.put(SET_NODE_NAME, projectNameLC);
settings.put(SET_PATH_DATA, projectLocation + "\" + CommonConstants.ANALYZER_DB_FOLDER);
settings.put(SET_CLUSTER_NAME, clusterName);
settings.put(SET_NODE_DATA, true);
settings.put(SET_NODE_LOCAL, true);
settings.put(SET_INDEX_REFRESH_INTERVAL, "-1");
settings.put(SET_INDEX_MERGE_ASYNC, true);
//the following settings are my attempt to speed up loading on the 2nd startup
settings.put("cluster.routing.allocation.disk.threshold_enabled", false);
settings.put("index.number_of_replicas", 0);
settings.put("cluster.routing.allocation.disk.include_relocations", false);
settings.put("cluster.routing.allocation.node_initial_primaries_recoveries", 25);
settings.put("cluster.routing.allocation.node_concurrent_recoveries", 8);
settings.put("indices.recovery.concurrent_streams", 6);
settings.put("indices.recovery.concurrent_streams", 6);
settings.put("indices.recovery.concurrent_small_file_streams", 4);

The questions:

  1. What happens during the second start up? The db folder size reduces from 2.4gb into 800 megabytes.
    2)If this process is necessary, can it be trigerred manually, so I can show nice "please wait" dialog?

The user experience on teh second database opening is very bad and I need to change it.

Cheers
Marcin


(Mike Simos) #2

What version of Elasticsearch are you using? Before shutting down you may want to try issuing a synced flush:

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-synced-flush.html

Its only available on Elasticsearch 1.6.0 or later. This may speed up the start up after shutdown.


#3

Hi Mike,
I was on 1.4, upgraded to 1.7 and now after I finished import to particular index I call the synced flush... and it did the trick!
I call:

client.admin().indices().flush(new FlushRequest(idxName));

Thanks for your help!


(system) #4