I have several clusters, all of which show the same issue in that index creation (and deletion) takes minutes.
What are the best places to look to understand why shard initialization/allocation is taking so long?
Do shard allocations across multiple nodes happen in parallel or sequentially?
200 node cluster, local 10x7200rpm JBOD
[2015-11-25 00:05:32,619][WARN ][cluster.service ] [elasearch128] cluster state update task [shard-started ([myindex], node[3grToi99RmGHLGuCqdx38g], [P], s[INITIALIZING], unassigned_info[[reason=INDEX_CREATED], at[2015-11-25T00:00:03.067Z]]), reason [after recovery from gateway]] took 3.9m above the warn threshold of 30s
10 node cluster, decent IOP SAN storage
[2015-11-25 04:05:01,577][WARN ][cluster.service ] [elasearch04] cluster state update task [shard-started ([testindex], node[U2G1ubycShqt-qZzNrMrug], [P], s[INITIALIZING], unassigned_info[[reason=INDEX_CREATED], at[2015-11-25T04:00:59.289Z]]), reason [after recovery from gateway]] took 1m above the warn threshold of 30s
When indices are created, deleted or mappings change, the cluster state need to be updated and propagated across the cluster. If you have a very large number of indices and/or very large mappings, the cluster state can get large, meaning there is a lot of data that needs to be distributed, which can cause the kind of issues you are describing.
Which version of Elasticsearch are you on? How many indices do you have in the cluster(s)?
In Elasticsearch 2.0 this has been improved, and only changes to the cluster state need to be replicated, reducing the amount of data that need to be exchanged due to cluster state changes.
Currently on 1.7.1.
Unfortunately each cluster has ~50,000 primaries and with one replica set, has 100,000 in total.
Long story short - the cluster is doing heavy indexing and the initial shard counts were to distribute that work to keep up with demand. The system contains multi-tenants and the # of tenants grew unchecked to create this scenario. The cluster started on slow VMs with a slow SAN store but at least one of them has transitioned to bare metal with local disks - meaning the # of shards can easily be reduced (better hardware). It's also obvious that at least until we look at 2.X, our shard #s may dictate splitting the cluster apart to meet our needs.
Putting aside how big the cluster is - are there any tweaks in 1.7.1 to improve the issue? Our templates/mappings aren't terribly complex - in fact, creating a simple 5 shard index w/o any matching templates also exhibits the issue.
I've taken steps to reduce the shard count for new indices and older indices will trim off over time but the performance issue of deleting indices is making that difficult.
Any help you could provide would be appreciated.
There are a number of indices I able to delete to get my index/shard count down but deleting each index is incredibly slow and painful so even getting back into a healthy state via that route is proving difficult.
Is it possible to shut the entire cluster down, delete the associated index folders from each node and then start back up?
Would I see cluster state or dangling index problems?