Whats the best way to break up large index into multiple smaller equally sized indexes?
I've looked at using reindex with a query but it seems extremely slow:
I do that already.
I noticed that when I am running a reindex with the above settings it takes an extremely long time before any documents are written. I'm thinking that its slowly working its way through the large index until it finds documents that match and then reindexes with those. As I'd need to repeat this operation several times, it would take far too long.
So I'm wondering if theres an 'official' way to break up indexes into smaller indexes.
So the best way to do it is using ILM for this one-off job? Is that how you'd approach it?
And there is no 'best practice', just people do it however they figure it out?
That's how I would do it, yes.
It allows you to define and automatically create the indices based on the size you want. It also makes sure they are easily queryable via an alias.
I'm trying it out, created an ILM and attached it to this index. I'll see how it turns out in the morning!
Thanks for the tip.
It does seem a bit nicer than running lots of curl API calls with queries etc.
What is the reason for breaking it up? If the shards are too large and affecting performance you can simply use the split index API to increase the number of primary shards. Querying a single index with X shards is not much different to querying X indices with 1 primary shard each from a performance perspective.
The index contains packetbeat data and was allowed to grow too large without ILM rotating it. A lot of that data is flow which is not needed. The index covers a period thats got valuable historical data.
I've been trying to re-index it into another index without the flow data but it goes incredibly slowly, as in 4 days later it hasn't done a quarter. I've done the same with other indexes that were about 100G and they went very fast, hours. I reasoned that if I first break this large index down into indexes of 100G or so each, it could go faster. Divide and conquer. I have limited time to get this done before I move on to another job and I'd like to get it done before then.
I made an ILM with the parameters set to hot phase, roll over on maximum index size of 50G and maximum age of 1 day and attached this index to it, this seems to be set fine, the index shows as being managed by this ILM.
But it doesn't appear to have kicked in and done anything with the index. Do I need to manually fire it off.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.