Rollover is checked periodically (every 10 minutes by default if I recall correctly). This is great for production size shards but not necessarily when testing it like you are now. Also be aware that the shard size can grow and shrink as merging occurs, so the size threshold is an estimate.