This is how ILM is intended to work: You direct all writes to an alias, desktop-alias
in your case. When the write index for that alias meets the conditions for rollover defined in your policy, a new index is created and all writes to desktop-alias
are directed to the new index. Using ILM with Rollover was generally designed to work best with new data.
The old index is still writable at this point, but only by writing to that index directly rather than the alias. (unless you've configured your policy to use the readonly
, forcemerge
, or shrink
actions in the warm phase, which will cause it to be set to read-only by setting "index.blocks.write": true
)
Let me see if I'm understanding correctly:
Your problem is that you have old data which has shard sizes that are too large (or too small, or generally the wrong size).
In order to correct this, you're trying to reindex the old data and use ILM so that it will be re-partitioned into new indices, which have same shard sizes as new data.
If that's the case, there are a few things to be aware of, and some alternatives.
For one, if you're reindexing into an index/alias that's managed by ILM and relying on ILM's rollover to determine when to create a new index, you should be aware that ILM checks the rollover conditions periodically - by default, every 10 minutes. A lot of indexing can happen in 10 minutes if you're ingesting at a very high rate (as is typical during a reindex), so your index sizes may be larger than you expect. You can configure the interval these checks happen with the indices.lifecycle.poll_interval
setting, which may help with that problem, but setting it too low can cause additional load on the master node. I would recommend keeping it above 1m
and setting it back to the default once you're done if you decide to want to keep going down this path.
For two, if you're using date math in your index names, as you appear to be, reindexing old data will result in those indices having data which has timestamps which do not match the name of the index. This won't cause any technical problems, but may be confusing. Similarly, if you're directing both old and new data into the same index, you'll have a mix of old and new data - which again, won't cause performance issues or anything, but may be confusing and cause difficulty with data retention (i.e. if you want to delete all data older than, say, 90 days, but that data is mixed in in the same index with data from 2 days ago, it's not as simple as just deleting the indices which contain the old data).
To correct shard sizes in the future, I recommend looking into the Shrink and Split APIs, although Split has some limitations for indices created in the 6.x series which may make it not useful for you for now. These APIs are much, much more efficient than reindexing.
If the Shrink/Split APIs aren't able to do what you want, you may also be able to reindex with a script similar to the example here to accomplish your goal with different tradeoffs.
Please correct me if I've misinterpreted you!