Hi,
I'm using daily indices and index templates in my Elasticsearch environment. The time is based on a field in the document.
Now I have I problem with bulk indexing:
For example if I import data from 1 year, 365 indices must be created in short time:
name-2016-01-01
name-2016-01-02
name-2016-01-03
...
name-2016-12-31
This results in poor performance on index creation and indexing, also if there is only 1 document per day.
My cluster: 3 ES Nodes, v5.4.0
Index settings: 2 shards, 0 replicas (during bulk index)
I have watched the creation process on shard level and found out:
- Maximum of 12 shards are INITIALIZING at the same time (is there a configurable limit?)
- All other shards follow with UNASSIGNED and will be processed one after another
name-2014-03-10 1 p INITIALIZING 0 130b 10.10.0.1 node1
name-2014-03-10 0 p INITIALIZING 0 130b 10.10.0.2 node2
... only up to 12
name-2014-01-02 1 p STARTED 144 40.2kb 10.10.0.2 node2
name-2014-01-02 0 p STARTED 144 35.1kb 10.10.0.3 node3
name-2014-02-01 1 p STARTED 146 19.3kb 10.10.0.1 node1
name-2014-02-01 0 p STARTED 142 17.7kb 10.10.0.2 node2
name-2014-02-05 1 p STARTED 147 20.4kb 10.10.0.1 node1
name-2014-02-05 0 p STARTED 141 18kb 10.10.0.2 node2
name-2014-02-20 1 p STARTED 143 17.6kb 10.10.0.2 node2
name-2014-02-20 0 p STARTED 145 26.6kb 10.10.0.3 node3
... started shards
name-2014-03-20 1 p UNASSIGNED
name-2014-03-20 1 p UNASSIGNED
name-2014-03-21 1 p UNASSIGNED
name-2014-03-21 1 p UNASSIGNED
name-2014-03-22 1 p UNASSIGNED
name-2014-03-22 1 p UNASSIGNED
... other shards waiting
So I took a while until all shards are started. After that, the indexing process is fast again.
Is there a way to optimize this?