Number of indexes vs index size


(Dan Fairs) #1

Hi,

What's the tradeoff between number of indexes, and number of documents in each index?

We're going to be indexing tweets about events. Some events just have a few tweets (zero, or less than ten) where as some events have millions. Most events will tend towards the lower end of the spectrum - it's rare to get an event with more than a couple of hundred thousand tweets.

We tend to do analysis on a per-event basis, and we also envisage (eventually) expiring old events from the system (we can always reload the data, if we need them.) We expect to have around 700 events per day.

Do you guys have any thoughts on what a sensible index strategy might be? I don't think One Big Index is a good design choice, from an operational point of view - as we're exploring ES I expect we'll occasionally have to reindex things, and I'd like to be able to do that incrementally, event by event (or at least by groups of events - say a day's worth).

However, I've also seen references in the docs to an overhead incurred by each index, though I've not seen what that overhead is. I therefore suspect that the 700-odd indexes per day created by an index-per-event strategy is also unwise.

I'm inclined, therefore, to start out with an index per day. This will usually tend to have a few hundred thousand documents in.

Does that sound reasonable? As I said, I'm really trying to understand what the nature of the tradeoff between index size and quantity of indexes is.

Cheers,
Dan

Dan Fairs | dan.fairs@gmail.com | @danfairs | www.fezconsulting.com


(Clinton Gormley) #2

Does that sound reasonable? As I said, I'm really trying to understand
what the nature of the tradeoff between index size and quantity of
indexes is.

That does sound reasonable. Also, check out kimchy's talk at Berlin
Buzzwords a month ago - he talks about scaling strategies:

http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html

clint


(Dan Fairs) #3

Does that sound reasonable? As I said, I'm really trying to understand
what the nature of the tradeoff between index size and quantity of
indexes is.

That does sound reasonable. Also, check out kimchy's talk at Berlin
Buzzwords a month ago - he talks about scaling strategies:

http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html

Thanks for the feedback, Clint - good to know I'm at least in the right ballpark!

I shall don the headphones forthwith.

Cheers,
Dan

Dan Fairs | dan.fairs@gmail.com | @danfairs | www.fezconsulting.com


(system) #4