Recommended way to reduce overload on ES


(ran) #1

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs around
0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge machines)
with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera with
rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

  1. split the index to daily index / 4 indexes per month (weekly) / 2
    indexes per month - we are not sure if there is a lot of overhead if we do
    that. is daily index is exaggerated ?
  2. Maybe adding shards can solve our problem ? what is the recommended
    number of shards for our amount of data?
  3. Upgrade to the latest version of ES could help solve that problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #2

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is probably
due to a lot of deletes and segment merges that happen under the hood (and
possibly a wrong setting for the Java heap). Using the aforementioned
approach means you can just archive or delete an entire index and not use
TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly depends on
your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

  1. Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, ran@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge machines)
with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

  1. split the index to daily index / 4 indexes per month (weekly) / 2
    indexes per month - we are not sure if there is a lot of overhead if we do
    that. is daily index is exaggerated ?
  2. Maybe adding shards can solve our problem ? what is the recommended
    number of shards for our amount of data?
  3. Upgrade to the latest version of ES could help solve that problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsSi5BcbKiY4EnwSUxHQiY9Yw3CkHVncCFVv_T2RNW6gg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(ran) #3

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use an
daily index for my scenario? ES can handle that amount of indexes (365 for
instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly depends
on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

  1. Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, <r...@taykey.com <javascript:>> wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge machines)
with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

  1. split the index to daily index / 4 indexes per month (weekly) / 2
    indexes per month - we are not sure if there is a lot of overhead if we do
    that. is daily index is exaggerated ?
  2. Maybe adding shards can solve our problem ? what is the recommended
    number of shards for our amount of data?
  3. Upgrade to the latest version of ES could help solve that problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #4

Elasticsearch can handle many open indexes on a cluster, but the advice is
to keep their number low per machine. I would try to aim for indexes at a
size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on which to
slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use an
daily index for my scenario? ES can handle that amount of indexes (365 for
instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly depends
on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

  1. Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge machines)
with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

  1. split the index to daily index / 4 indexes per month (weekly) / 2
    indexes per month - we are not sure if there is a lot of overhead if we do
    that. is daily index is exaggerated ?
  2. Maybe adding shards can solve our problem ? what is
    the recommended number of shards for our amount of data?
  3. Upgrade to the latest version of ES could help solve that problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #5

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene version
has changed as well. You can leverage that process by building a new
cluster with the latest version, then migrating data over, and make tweaks
to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko itamar@code972.com wrote:

Elasticsearch can handle many open indexes on a cluster, but the advice is
to keep their number low per machine. I would try to aim for indexes at a
size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on which
to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use an
daily index for my scenario? ES can handle that amount of indexes (365 for
instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly depends
on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

  1. Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

  1. split the index to daily index / 4 indexes per month (weekly) / 2
    indexes per month - we are not sure if there is a lot of overhead if we do
    that. is daily index is exaggerated ?
  2. Maybe adding shards can solve our problem ? what is
    the recommended number of shards for our amount of data?
  3. Upgrade to the latest version of ES could help solve that problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #6

There's no need to reindex, it is enough to do full cluster restart after
upgrading the binaries and ES/Lucene will take care of the rest

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 2:42 AM, Mark Walkom markw@campaignmonitor.comwrote:

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene version
has changed as well. You can leverage that process by building a new
cluster with the latest version, then migrating data over, and make tweaks
to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko itamar@code972.com wrote:

Elasticsearch can handle many open indexes on a cluster, but the advice
is to keep their number low per machine. I would try to aim for indexes at
a size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on which
to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use an
daily index for my scenario? ES can handle that amount of indexes (365 for
instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly
depends on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

  1. Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5 Tera
with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last days.

We have several options in head, and wanted to choose the best one:

  1. split the index to daily index / 4 indexes per month (weekly) /
    2 indexes per month - we are not sure if there is a lot of overhead if we
    do that. is daily index is exaggerated ?
  2. Maybe adding shards can solve our problem ? what is
    the recommended number of shards for our amount of data?
  3. Upgrade to the latest version of ES could help solve that
    problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #7

Really?
I've seen most recommend this, especially given such a large increase in
the ES + lucene versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 April 2014 09:48, Itamar Syn-Hershko itamar@code972.com wrote:

There's no need to reindex, it is enough to do full cluster restart after
upgrading the binaries and ES/Lucene will take care of the rest

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 2:42 AM, Mark Walkom markw@campaignmonitor.comwrote:

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene version
has changed as well. You can leverage that process by building a new
cluster with the latest version, then migrating data over, and make tweaks
to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko itamar@code972.com wrote:

Elasticsearch can handle many open indexes on a cluster, but the advice
is to keep their number low per machine. I would try to aim for indexes at
a size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on which
to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use
an daily index for my scenario? ES can handle that amount of indexes (365
for instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly
depends on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

  1. Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5
Tera with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last
days.

We have several options in head, and wanted to choose the best one:

  1. split the index to daily index / 4 indexes per month (weekly) /
    2 indexes per month - we are not sure if there is a lot of overhead if we
    do that. is daily index is exaggerated ?
  2. Maybe adding shards can solve our problem ? what is
    the recommended number of shards for our amount of data?
  3. Upgrade to the latest version of ES could help solve that
    problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #8

Lucene is and has always been backwards-compatible. On the first merge of
an index created with an earlier version it will get upgraded to the latest.

And you can't always reindex (think huge installations), so I believe this
is actually the intent of ES core team to not require that.

You may want to upgrade gradually (0.90 and then 1.x) just to be safe, but
you don't have to reindex.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 3:02 AM, Mark Walkom markw@campaignmonitor.comwrote:

Really?
I've seen most recommend this, especially given such a large increase in
the ES + lucene versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 April 2014 09:48, Itamar Syn-Hershko itamar@code972.com wrote:

There's no need to reindex, it is enough to do full cluster restart after
upgrading the binaries and ES/Lucene will take care of the rest

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 2:42 AM, Mark Walkom markw@campaignmonitor.comwrote:

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene version
has changed as well. You can leverage that process by building a new
cluster with the latest version, then migrating data over, and make tweaks
to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko itamar@code972.com wrote:

Elasticsearch can handle many open indexes on a cluster, but the advice
is to keep their number low per machine. I would try to aim for indexes at
a size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on
which to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use
an daily index for my scenario? ES can handle that amount of indexes (365
for instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly
depends on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

  1. Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each document weighs
around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5
Tera with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last
days.

We have several options in head, and wanted to choose the best one:

  1. split the index to daily index / 4 indexes per month (weekly)
    / 2 indexes per month - we are not sure if there is a lot of overhead if we
    do that. is daily index is exaggerated ?
  2. Maybe adding shards can solve our problem ? what is
    the recommended number of shards for our amount of data?
  3. Upgrade to the latest version of ES could help solve that
    problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-
f36f-4f18-99bc-6a4e000045e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuoY1U40ORxm6tEw4edJMG3ZBmECBhy0eQiMhCopGboLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(R. Toma) #9

We have succesfully upgraded from ES 0.20.x via 0.90 to 1.0.1. No
reindexing was needed.

Op vrijdag 25 april 2014 02:09:36 UTC+2 schreef Itamar Syn-Hershko:

Lucene is and has always been backwards-compatible. On the first merge of
an index created with an earlier version it will get upgraded to the latest.

And you can't always reindex (think huge installations), so I believe this
is actually the intent of ES core team to not require that.

You may want to upgrade gradually (0.90 and then 1.x) just to be safe, but
you don't have to reindex.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 3:02 AM, Mark Walkom <ma...@campaignmonitor.com<javascript:>

wrote:

Really?
I've seen most recommend this, especially given such a large increase in
the ES + lucene versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 25 April 2014 09:48, Itamar Syn-Hershko <ita...@code972.com<javascript:>

wrote:

There's no need to reindex, it is enough to do full cluster restart
after upgrading the binaries and ES/Lucene will take care of the rest

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 2:42 AM, Mark Walkom <ma...@campaignmonitor.com<javascript:>

wrote:

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene
version has changed as well. You can leverage that process by building a
new cluster with the latest version, then migrating data over, and make
tweaks to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko <ita...@code972.com<javascript:>

wrote:

Elasticsearch can handle many open indexes on a cluster, but the
advice is to keep their number low per machine. I would try to aim for
indexes at a size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on
which to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, <r...@taykey.com <javascript:>>wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use
an daily index for my scenario? ES can handle that amount of indexes (365
for instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko
wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly
depends on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

  1. Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each
document weighs around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5
Tera with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last
days.

We have several options in head, and wanted to choose the best one:

  1. split the index to daily index / 4 indexes per month (weekly)
    / 2 indexes per month - we are not sure if there is a lot of overhead if we
    do that. is daily index is exaggerated ?
  2. Maybe adding shards can solve our problem ? what is
    the recommended number of shards for our amount of data?
  3. Upgrade to the latest version of ES could help solve that
    problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-
f36f-4f18-99bc-6a4e000045e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c20ccb9b-d4d9-42fc-ac24-a8690bd14cae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #10

Good to know, thanks for the tip!

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 April 2014 10:09, Itamar Syn-Hershko itamar@code972.com wrote:

Lucene is and has always been backwards-compatible. On the first merge of
an index created with an earlier version it will get upgraded to the latest.

And you can't always reindex (think huge installations), so I believe this
is actually the intent of ES core team to not require that.

You may want to upgrade gradually (0.90 and then 1.x) just to be safe, but
you don't have to reindex.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 3:02 AM, Mark Walkom markw@campaignmonitor.comwrote:

Really?
I've seen most recommend this, especially given such a large increase in
the ES + lucene versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 April 2014 09:48, Itamar Syn-Hershko itamar@code972.com wrote:

There's no need to reindex, it is enough to do full cluster restart
after upgrading the binaries and ES/Lucene will take care of the rest

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 25, 2014 at 2:42 AM, Mark Walkom markw@campaignmonitor.comwrote:

Upgrade ES! That is a very very old version and there are numerous
performance improvements in the later versions.
You will need to reindex your data though, the underlying lucene
version has changed as well. You can leverage that process by building a
new cluster with the latest version, then migrating data over, and make
tweaks to suit your structure.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 24 April 2014 23:45, Itamar Syn-Hershko itamar@code972.com wrote:

Elasticsearch can handle many open indexes on a cluster, but the
advice is to keep their number low per machine. I would try to aim for
indexes at a size of around 10GB, if this means a weekly index - so be it.

My advice is on using time sliced indexes, not on the time span on
which to slice them. There's no thumb rule for that one I'm afraid.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 4:37 PM, ran@taykey.com wrote:

Hi Itamar , thanks for the quick reply.

Let me verify that I understood you correctly, you recommended to use
an daily index for my scenario? ES can handle that amount of indexes (365
for instance) ? or that procedure could make an overhead ?

On Thursday, April 24, 2014 3:55:22 PM UTC+3, Itamar Syn-Hershko
wrote:

You are currently letting ES handle sharding for you, but using the
rolling-indexes approach (aka time sliced indexes) where indexes contain
all data for a given period of time and named after that period makes much
more sense. Read: perform the sharding yourself on the index level, and use
aliases or multi-index queries to maintain that.

This will help with retiring old indexes. The high CPU scenario is
probably due to a lot of deletes and segment merges that happen under the
hood (and possibly a wrong setting for the Java heap). Using the
aforementioned approach means you can just archive or delete an entire
index and not use TTLs or delete-by-query processes.

Deciding on the optimal size of an index in that scenario highly
depends on your data, usage patterns and a lot of experimenting.

That's to answer 1 & 2

  1. Definitely, 0.20 is a very old version

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 3:48 PM, r...@taykey.com wrote:

Hi all,

I'm looking for the recommended solution for my situation.

We have a time based data.
Each day we index around 43,000,000 documents. Each
document weighs around 0.6k.
Our cluster contains 8 nodes (Amazon - m1.xlarge and m3.xlarge
machines) with 12GB mem, running ES version 0.20.5.
We have an index per month. (~1,290,000,000 documents, 700GB {1.5
Tera with rep), each index has 8 shards with 1 replica (total of 16 shards)
We are currently storing 4 months of data.

At the end of every month we experienced an high cpu usage on the
machines, and many heap size failures on the nodes (which cause a full
cluster restart).
Most of our queries search the documents within a range of 3 last
days.

We have several options in head, and wanted to choose the best one:

  1. split the index to daily index / 4 indexes per month (weekly)
    / 2 indexes per month - we are not sure if there is a lot of overhead if we
    do that. is daily index is exaggerated ?
  2. Maybe adding shards can solve our problem ? what is
    the recommended number of shards for our amount of data?
  3. Upgrade to the latest version of ES could help solve that
    problem?

Thanks!

Best regards, Ran

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8bc3503-
f36f-4f18-99bc-6a4e000045e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d8bc3503-f36f-4f18-99bc-6a4e000045e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7cf35ba7-7563-4521-b782-780790224ae8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuMkFPX%3DUq_U9Qd6jOdf-HLh2fZXBFODvJyr7Xu98pfdg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624bhcdXRDLt4rACqx97Za%3D0SxO093zFvYfpO4rmmGh%3Dfaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsbuC1GDWKOuZ28xmJDtzKG%2BGuY7MNofAYMTWDr62Vo%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624ZuKr94RMtPZ8xM-nPiTiwZSQM7dMnXNq99eUFpCDfq4w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuoY1U40ORxm6tEw4edJMG3ZBmECBhy0eQiMhCopGboLA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuoY1U40ORxm6tEw4edJMG3ZBmECBhy0eQiMhCopGboLA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YEV_XCGEQOKzBrxLYs9DwRp6hrU5aD_9zkdPF4-4Xi4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Indexing documents to elasticsearch monthly?
(system) #11