Choosing index count balance


(Ian Marsman) #1

Hello. I am working on an application to index documents using
ElasticSearch. I estimate that a month's worth of indexes will run to about
1.5 GB. The vast majority (90% at least) of queries are made against items
for the previous 24 hours, which is about 60 MB in indexes. It thus seems
logical to create per-day indexes. Are there other considerations though in
creating hundreds of indexes (for years of data)? Is there significant
overhead such as index load time etc. in making a query that would cover
30+ indexes? The servers I am considering for ElasticSearch have 7.5 GB or
RAM (m1.large EC2). Might per-month indexes (1.5 GB) be OK for this sort of
situation or perhaps 3/month? I don't need anyone to tell me what to do but
I would value any thoughts on advantages and tradeoffs.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Luca Cavanna) #2

Hi Ian,
querying 30 indices of one shard each or one index made of 30 shards is
exactly the same in terms of number of shards that need to execute the
query.
Time based indexing seems to be a good fit in your case, but I'd suggest to
run some performance testing to understand what is the capacity of a single
shard with your data, queries and hardware. That way you should be able to
understand if an index per day seems like a waste (as it needs to have at
least a shard, and maybe you wouldn't index enough documents on it on a
single day).

Have you had the chance to watch this talk
already: http://vimeo.com/44716955 ? It elaborates on some advanced data
desing patterns that you can apply depending on how your data flows into
your system. That doesn't mean that you need to use custom routing, but
it's something that you might want to consider or at least be aware of.

Cheers
Luca

On Thursday, October 17, 2013 2:10:52 AM UTC+2, Ian Marsman wrote:

Hello. I am working on an application to index documents using
ElasticSearch. I estimate that a month's worth of indexes will run to about
1.5 GB. The vast majority (90% at least) of queries are made against items
for the previous 24 hours, which is about 60 MB in indexes. It thus seems
logical to create per-day indexes. Are there other considerations though in
creating hundreds of indexes (for years of data)? Is there significant
overhead such as index load time etc. in making a query that would cover
30+ indexes? The servers I am considering for ElasticSearch have 7.5 GB or
RAM (m1.large EC2). Might per-month indexes (1.5 GB) be OK for this sort of
situation or perhaps 3/month? I don't need anyone to tell me what to do but
I would value any thoughts on advantages and tradeoffs.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3