You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.
This will help with retiring old indexes. The high CPU scenario is probably
due to a lot of deletes and segment merges that happen under the hood (and
possibly a wrong setting for the Java heap). Using the aforementioned
approach means you can just archive or delete an entire index and not use
TTLs or delete-by-query processes.
Deciding on the optimal size of an index in that scenario highly depends on
your data, usage patterns and a lot of experimenting.
That's to answer 1 & 2
- Definitely, 0.20 is a very old version
--
Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/
On Thu, Apr 24, 2014 at 3:48 PM, ran@taykey.com wrote:
Hi all,
I'm looking for the recommended solution for my situation.
We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge machines)
with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.
At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.
We have several options in head, and wanted to choose the best one:
- split the index to daily index / 4 indexes per month (weekly) / 2
indexes per month - we are not sure if there is a lot of overhead if we do
that. is daily index is exaggerated ?
- Maybe adding shards can solve our problem ? what is the recommended
number of shards for our amount of data?
- Upgrade to the latest version of ES could help solve that problem?
Thanks!
Best regards, Ran
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsSi5BcbKiY4EnwSUxHQiY9Yw3CkHVncCFVv_T2RNW6gg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.