Hi Daithi,
Its better to have shard size around 50gb for best performance. If we have index with shard size in TB then it quite difficult to handle when you have take the snapshot or rolling over from one data tier to another data tier.
Split vs reindex:
If you go with reindexing it will take so much time for example like if you want reindex 1GB of data will take around 4-5 minutes so in case of TB data it will gona take days for reindexing on the other hand if you go with split api it will quickly split the index with desired primary shard that you have provide in the split api.
Also one thing when you apply split api make sure you must have good amount of storage because in the begening index try to allocated all the shard on different node and then allocate the data so in this process you might see your storage get incresed by maybe 3-4 times but it will come to its original state by some time.
here are the link for reference
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.