Keeping each site's logs in its own index would be neater and might net you some small performance boost because you don't have to filter on site its not a good idea for two reasons:
- Each shard has a non-trivial overhead.
- Deleting old documents from an index is way, way more work for elasticsearch than deleting old indexes.
Given these anytime you can rotate your problem into one of time series indexes you'll tend to do better.
Its fine, for instance, to put your biggest customers in their own indexes. Its just that you can't have tons of and of indexes because then you'll have tons and tons of shards.
The "index per week" thing is one of those balancing act things - the overhead of having lots of indexes is worth it because we can delete stuff after its retention period has passed more easily. And there are a few other nice things - writing to empty indexes is faster than full ones. Once you know you'll never modify an index again you can
_optimize it to squash it into a single segment for faster searching and, typically, less disk usage. Also each index can only hold a maximum of java's MAX_INT documents, so about 2 billion. And time series indexes gives you a convenient place to say "now I'm making a new one" so you don't run into that.