Recommended way to reduce overload on ES

ran · April 24, 2014, 12:48pm

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs around
0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge machines)
with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera with
rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

split the index to daily index / 4 indexes per month (weekly) / 2
indexes per month - we are not sure if there is a lot of overhead if we do
that. is daily index is exaggerated ?
Maybe adding shards can solve our problem ? what is the recommended
number of shards for our amount of data?
Upgrade to the latest version of ES could help solve that problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Itamar_Syn_Hershko · April 24, 2014, 12:55pm

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is probably
due to a lot of deletes and segment merges that happen under the hood (and
possibly a wrong setting for the Java heap). Using the aforementioned
approach means you can just archive or delete an entire index and not use
TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly depends on
your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, ran@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge machines)
with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

split the index to daily index / 4 indexes per month (weekly) / 2
indexes per month - we are not sure if there is a lot of overhead if we do
that. is daily index is exaggerated ?

Maybe adding shards can solve our problem ? what is the recommended
number of shards for our amount of data?

Upgrade to the latest version of ES could help solve that problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsSi5BcbKiY4EnwSUxHQiY9Yw3CkHVncCFVv_T2RNW6gg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ran · April 24, 2014, 1:37pm

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use an
daily index for my scenario? ES can handle that amount of indexes (365 for
instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly depends
on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, <r...@taykey.com <javascript:>> wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge machines)
with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

split the index to daily index / 4 indexes per month (weekly) / 2
indexes per month - we are not sure if there is a lot of overhead if we do
that. is daily index is exaggerated ?

Maybe adding shards can solve our problem ? what is the recommended
number of shards for our amount of data?

Upgrade to the latest version of ES could help solve that problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Itamar_Syn_Hershko · April 24, 2014, 1:45pm

Elasticsearch can handle many open indexes on a cluster, but the advice is
to keep their number low per machine. I would try to aim for indexes at a
size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on which to
slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use an
daily index for my scenario? ES can handle that amount of indexes (365 for
instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly depends
on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge machines)
with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

split the index to daily index / 4 indexes per month (weekly) / 2
indexes per month - we are not sure if there is a lot of overhead if we do
that. is daily index is exaggerated ?

Maybe adding shards can solve our problem ? what is
the recommended number of shards for our amount of data?

Upgrade to the latest version of ES could help solve that problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · April 24, 2014, 11:42pm

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene version
has changed as well. You can leverage that process by building a new
cluster with the latest version, then migrating data over, and make tweaks
to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko itamar@code972.com wrote:

Elasticsearch can handle many open indexes on a cluster, but the advice is
to keep their number low per machine. I would try to aim for indexes at a
size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on which
to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use an
daily index for my scenario? ES can handle that amount of indexes (365 for
instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly depends
on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

split the index to daily index / 4 indexes per month (weekly) / 2
indexes per month - we are not sure if there is a lot of overhead if we do
that. is daily index is exaggerated ?

Maybe adding shards can solve our problem ? what is
the recommended number of shards for our amount of data?

Upgrade to the latest version of ES could help solve that problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Itamar_Syn_Hershko · April 24, 2014, 11:48pm

There's no need to reindex, it is enough to do full cluster restart after
upgrading the binaries and ES/Lucene will take care of the rest

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 2:42 AM, Mark Walkom markw@campaignmonitor.comwrote:

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene version
has changed as well. You can leverage that process by building a new
cluster with the latest version, then migrating data over, and make tweaks
to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko itamar@code972.com wrote:

Elasticsearch can handle many open indexes on a cluster, but the advice
is to keep their number low per machine. I would try to aim for indexes at
a size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on which
to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use an
daily index for my scenario? ES can handle that amount of indexes (365 for
instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly
depends on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

split the index to daily index / 4 indexes per month (weekly) /
2 indexes per month - we are not sure if there is a lot of overhead if we
do that. is daily index is exaggerated ?

Maybe adding shards can solve our problem ? what is
the recommended number of shards for our amount of data?

Upgrade to the latest version of ES could help solve that
problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · April 25, 2014, 12:02am

Really?
I've seen most recommend this, especially given such a large increase in
the ES + lucene versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 April 2014 09:48, Itamar Syn-Hershko itamar@code972.com wrote:

There's no need to reindex, it is enough to do full cluster restart after
upgrading the binaries and ES/Lucene will take care of the rest

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 2:42 AM, Mark Walkom markw@campaignmonitor.comwrote:

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene version
has changed as well. You can leverage that process by building a new
cluster with the latest version, then migrating data over, and make tweaks
to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko itamar@code972.com wrote:

Elasticsearch can handle many open indexes on a cluster, but the advice
is to keep their number low per machine. I would try to aim for indexes at
a size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on which
to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use
an daily index for my scenario? ES can handle that amount of indexes (365
for instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly
depends on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5
Tera with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last
days.

We have several options in head, and wanted to choose the best one:

split the index to daily index / 4 indexes per month (weekly) /
2 indexes per month - we are not sure if there is a lot of overhead if we
do that. is daily index is exaggerated ?

Maybe adding shards can solve our problem ? what is
the recommended number of shards for our amount of data?

Upgrade to the latest version of ES could help solve that
problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Itamar_Syn_Hershko · April 25, 2014, 12:09am

Lucene is and has always been backwards-compatible. On the first merge of
an index created with an earlier version it will get upgraded to the latest.

And you can't always reindex (think huge installations), so I believe this
is actually the intent of ES core team to not require that.

You may want to upgrade gradually (0.90 and then 1.x) just to be safe, but
you don't have to reindex.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 3:02 AM, Mark Walkom markw@campaignmonitor.comwrote:

Really?
I've seen most recommend this, especially given such a large increase in
the ES + lucene versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 April 2014 09:48, Itamar Syn-Hershko itamar@code972.com wrote:

There's no need to reindex, it is enough to do full cluster restart after
upgrading the binaries and ES/Lucene will take care of the rest

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 2:42 AM, Mark Walkom markw@campaignmonitor.comwrote:

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene version
has changed as well. You can leverage that process by building a new
cluster with the latest version, then migrating data over, and make tweaks
to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko itamar@code972.com wrote:

Elasticsearch can handle many open indexes on a cluster, but the advice
is to keep their number low per machine. I would try to aim for indexes at
a size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on
which to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use
an daily index for my scenario? ES can handle that amount of indexes (365
for instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly
depends on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5
Tera with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last
days.

We have several options in head, and wanted to choose the best one:

split the index to daily index / 4 indexes per month (weekly)
/ 2 indexes per month - we are not sure if there is a lot of overhead if we
do that. is daily index is exaggerated ?

Maybe adding shards can solve our problem ? what is
the recommended number of shards for our amount of data?

Upgrade to the latest version of ES could help solve that
problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-
f36f-4f18-99bc-6a4e000045e5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuoY1U40ORxm6tEw4edJMG3ZBmECBhy0eQiMhCopGboLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

R_Toma · April 25, 2014, 9:56am

We have succesfully upgraded from ES 0.20.x via 0.90 to 1.0.1. No
reindexing was needed.

Op vrijdag 25 april 2014 02:09:36 UTC+2 schreef Itamar Syn-Hershko:

Lucene is and has always been backwards-compatible. On the first merge of
an index created with an earlier version it will get upgraded to the latest.

And you can't always reindex (think huge installations), so I believe this
is actually the intent of ES core team to not require that.

You may want to upgrade gradually (0.90 and then 1.x) just to be safe, but
you don't have to reindex.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 3:02 AM, Mark Walkom <ma...@campaignmonitor.com<javascript:>

wrote:

Really?
I've seen most recommend this, especially given such a large increase in
the ES + lucene versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 25 April 2014 09:48, Itamar Syn-Hershko <ita...@code972.com<javascript:>

wrote:

There's no need to reindex, it is enough to do full cluster restart
after upgrading the binaries and ES/Lucene will take care of the rest

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 2:42 AM, Mark Walkom <ma...@campaignmonitor.com<javascript:>

wrote:

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene
version has changed as well. You can leverage that process by building a
new cluster with the latest version, then migrating data over, and make
tweaks to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko <ita...@code972.com<javascript:>

wrote:

Elasticsearch can handle many open indexes on a cluster, but the
advice is to keep their number low per machine. I would try to aim for
indexes at a size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on
which to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, <r...@taykey.com <javascript:>>wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use
an daily index for my scenario? ES can handle that amount of indexes (365
for instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko
wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly
depends on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each
document weighs around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5
Tera with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last
days.

We have several options in head, and wanted to choose the best one:

split the index to daily index / 4 indexes per month (weekly)
/ 2 indexes per month - we are not sure if there is a lot of overhead if we
do that. is daily index is exaggerated ?

Maybe adding shards can solve our problem ? what is
the recommended number of shards for our amount of data?

Upgrade to the latest version of ES could help solve that
problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-
f36f-4f18-99bc-6a4e000045e5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c20ccb9b-d4d9-42fc-ac24-a8690bd14cae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · April 25, 2014, 10:31am

Good to know, thanks for the tip!

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 April 2014 10:09, Itamar Syn-Hershko itamar@code972.com wrote:

Lucene is and has always been backwards-compatible. On the first merge of
an index created with an earlier version it will get upgraded to the latest.

And you can't always reindex (think huge installations), so I believe this
is actually the intent of ES core team to not require that.

You may want to upgrade gradually (0.90 and then 1.x) just to be safe, but
you don't have to reindex.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 3:02 AM, Mark Walkom markw@campaignmonitor.comwrote:

Really?
I've seen most recommend this, especially given such a large increase in
the ES + lucene versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 April 2014 09:48, Itamar Syn-Hershko itamar@code972.com wrote:

There's no need to reindex, it is enough to do full cluster restart
after upgrading the binaries and ES/Lucene will take care of the rest

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 2:42 AM, Mark Walkom markw@campaignmonitor.comwrote:

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene
version has changed as well. You can leverage that process by building a
new cluster with the latest version, then migrating data over, and make
tweaks to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko itamar@code972.com wrote:

Elasticsearch can handle many open indexes on a cluster, but the
advice is to keep their number low per machine. I would try to aim for
indexes at a size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on
which to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use
an daily index for my scenario? ES can handle that amount of indexes (365
for instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko
wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly
depends on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each
document weighs around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5
Tera with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last
days.

We have several options in head, and wanted to choose the best one:

split the index to daily index / 4 indexes per month (weekly)
/ 2 indexes per month - we are not sure if there is a lot of overhead if we
do that. is daily index is exaggerated ?

Maybe adding shards can solve our problem ? what is
the recommended number of shards for our amount of data?

Upgrade to the latest version of ES could help solve that
problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-
f36f-4f18-99bc-6a4e000045e5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuoY1U40ORxm6tEw4edJMG3ZBmECBhy0eQiMhCopGboLA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuoY1U40ORxm6tEw4edJMG3ZBmECBhy0eQiMhCopGboLA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YEV_XCGEQOKzBrxLYs9DwRp6hrU5aD_9zkdPF4-4Xi4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Indexing performance Elasticsearch	6	367	July 6, 2017
ElasticSearch Performance Elasticsearch	4	348	October 12, 2020
Memory usage constantly increasing? Elasticsearch	6	1151	July 5, 2017
Questions from a newbie Elasticsearch	15	417	July 6, 2017
Disk space per node in for ES cluster is not balanced across the nodes Elasticsearch	4	5224	December 3, 2018

Recommended way to reduce overload on ES

Related topics