Keep the number of segments to 5


(Ophir Michaeli) #1

Hi everyone,

I'm running a system of 2 nodes with 66 million documents each (additional
nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good performance.

Best Regards,
Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #2

How did you arrive at this number of 5?

To being with, what sizes are your shards? what are the specs of your
servers?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:11 PM, Ophir Michaeli ophirmichaeli@gmail.com
wrote:

Hi everyone,

I'm running a system of 2 nodes with 66 million documents each (additional
nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good performance.

Best Regards,
Ophir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zv1wWqtNLfYNrsEpiniELnOnpweD7yTiao3jpPefHfncw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ophir Michaeli) #3

I got to 5 by doing some performance tests, could be that 1 or 10 are also
ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one node
on a machine).
Each server is Server 2008 R2, 16GB Ram.

On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:

How did you arrive at this number of 5?

To being with, what sizes are your shards? what are the specs of your
servers?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:11 PM, Ophir Michaeli <ophirm...@gmail.com
<javascript:>> wrote:

Hi everyone,

I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.

Best Regards,
Ophir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #4

By shard size I meant on disk

There's a lot you can do to optimize performance, worrying about the number
of segments is the last of them really

Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:59 PM, Ophir Michaeli ophirmichaeli@gmail.com
wrote:

I got to 5 by doing some performance tests, could be that 1 or 10 are also
ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.

On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:

How did you arrive at this number of 5?

To being with, what sizes are your shards? what are the specs of your
servers?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:11 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

Hi everyone,

I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.

Best Regards,
Ophir

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZtPepPOF%3D%2B-YuAj9iTaJHDJ612CHbj3Pf4NoWLN5CCjuQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ophir Michaeli) #5

Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?

On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:

By shard size I meant on disk

There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really

Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:59 PM, Ophir Michaeli <ophirm...@gmail.com
<javascript:>> wrote:

I got to 5 by doing some performance tests, could be that 1 or 10 are
also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.

On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:

How did you arrive at this number of 5?

To being with, what sizes are your shards? what are the specs of your
servers?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:11 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

Hi everyone,

I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.

Best Regards,
Ophir

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #6

Because Elasticsearch will usually get the merge policy right, and you are
better off not trying to fine tune it yourself.

Based on those numbers, I'd say you should add more servers if not RAM.
230GB on 16GB servers is going to cause a lot of thrashing definitely if
you are doing a lot of aggregation operations (aka faceting)

You probably can find ways to fine tune and squeeze more performance out of
what you currently have (again - using filters, codecs and other advanced
configs) but it's probably just wiser to scale out

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 4:20 PM, Ophir Michaeli ophirmichaeli@gmail.com
wrote:

Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?

On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:

By shard size I meant on disk

There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really

Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:59 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

I got to 5 by doing some performance tests, could be that 1 or 10 are
also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.

On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:

How did you arrive at this number of 5?

To being with, what sizes are your shards? what are the specs of your
servers?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:11 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

Hi everyone,

I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.

Best Regards,
Ophir

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZusmBRFVA%3DuOm4EgynMYgD5nmoWSirnsrQvGbc7m53PpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #7

Also, optimize is an incredibly costly (CPU, IO) operation. Really, it
should only be done when you know the index will no longer change, e.g.
when the daily log index is done being written.

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 13, 2014 at 9:26 AM, Itamar Syn-Hershko itamar@code972.com
wrote:

Because Elasticsearch will usually get the merge policy right, and you are
better off not trying to fine tune it yourself.

Based on those numbers, I'd say you should add more servers if not RAM.
230GB on 16GB servers is going to cause a lot of thrashing definitely if
you are doing a lot of aggregation operations (aka faceting)

You probably can find ways to fine tune and squeeze more performance out
of what you currently have (again - using filters, codecs and other
advanced configs) but it's probably just wiser to scale out

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 4:20 PM, Ophir Michaeli ophirmichaeli@gmail.com
wrote:

Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?

On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:

By shard size I meant on disk

There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really

Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:59 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

I got to 5 by doing some performance tests, could be that 1 or 10 are
also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.

On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:

How did you arrive at this number of 5?

To being with, what sizes are your shards? what are the specs of your
servers?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:11 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

Hi everyone,

I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search
while indexing is too slow), meaning running optimize every given time on
the delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.

Best Regards,
Ophir

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZusmBRFVA%3DuOm4EgynMYgD5nmoWSirnsrQvGbc7m53PpQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZusmBRFVA%3DuOm4EgynMYgD5nmoWSirnsrQvGbc7m53PpQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRfrLQhxqHv9Yq5Au1jcZSkHpfVB7DH%3D5xLjENeRYLMgBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ophir Michaeli) #8

On Monday, July 14, 2014 12:44:14 PM UTC+3, Michael McCandless wrote:

Also, optimize is an incredibly costly (CPU, IO) operation. Really, it
should only be done when you know the index will no longer change, e.g.
when the daily log index is done being written.

Mike McCandless

http://blog.mikemccandless.com

Is there an optimal ratio between index disk size and ram?

On Sun, Jul 13, 2014 at 9:26 AM, Itamar Syn-Hershko <ita...@code972.com
<javascript:>> wrote:

Because Elasticsearch will usually get the merge policy right, and you
are better off not trying to fine tune it yourself.

Based on those numbers, I'd say you should add more servers if not RAM.
230GB on 16GB servers is going to cause a lot of thrashing definitely if
you are doing a lot of aggregation operations (aka faceting)

You probably can find ways to fine tune and squeeze more performance out
of what you currently have (again - using filters, codecs and other
advanced configs) but it's probably just wiser to scale out

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 4:20 PM, Ophir Michaeli <ophirm...@gmail.com
<javascript:>> wrote:

Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?

On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:

By shard size I meant on disk

There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really

Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:59 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

I got to 5 by doing some performance tests, could be that 1 or 10
are also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.

On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:

How did you arrive at this number of 5?

To being with, what sizes are your shards? what are the specs of your
servers?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:11 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

Hi everyone,

I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search
while indexing is too slow), meaning running optimize every given time on
the delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.

Best Regards,
Ophir

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZusmBRFVA%3DuOm4EgynMYgD5nmoWSirnsrQvGbc7m53PpQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZusmBRFVA%3DuOm4EgynMYgD5nmoWSirnsrQvGbc7m53PpQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c78aba39-1a8d-44ab-a950-bc054315cb91%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ophir Michaeli) #9

Is there an optimal ratio between index disk size and ram?

On Monday, July 14, 2014 12:44:14 PM UTC+3, Michael McCandless wrote:

Also, optimize is an incredibly costly (CPU, IO) operation. Really, it
should only be done when you know the index will no longer change, e.g.
when the daily log index is done being written.

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 13, 2014 at 9:26 AM, Itamar Syn-Hershko <ita...@code972.com
<javascript:>> wrote:

Because Elasticsearch will usually get the merge policy right, and you
are better off not trying to fine tune it yourself.

Based on those numbers, I'd say you should add more servers if not RAM.
230GB on 16GB servers is going to cause a lot of thrashing definitely if
you are doing a lot of aggregation operations (aka faceting)

You probably can find ways to fine tune and squeeze more performance out
of what you currently have (again - using filters, codecs and other
advanced configs) but it's probably just wiser to scale out

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 4:20 PM, Ophir Michaeli <ophirm...@gmail.com
<javascript:>> wrote:

Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?

On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:

By shard size I meant on disk

There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really

Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:59 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

I got to 5 by doing some performance tests, could be that 1 or 10
are also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.

On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:

How did you arrive at this number of 5?

To being with, what sizes are your shards? what are the specs of your
servers?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:11 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

Hi everyone,

I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search
while indexing is too slow), meaning running optimize every given time on
the delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.

Best Regards,
Ophir

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667
a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZusmBRFVA%3DuOm4EgynMYgD5nmoWSirnsrQvGbc7m53PpQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZusmBRFVA%3DuOm4EgynMYgD5nmoWSirnsrQvGbc7m53PpQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/16d66a8d-9e25-492f-9c9b-953700ed1239%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #10

Again, it highly depends on your data and your usage.

My rule of thumb is to have enough memory to hold the entire active
indexes. So if you have a 16GB server, that means Elasticsearch gets 8GB,
that means you can deal with indexes of size 8GB in total. The OS will use
the memory assigned to it to mmap the index files, and Elasticsearch will
use it's 8GB to efficiently load field data and perform indexing. This
refers to highly used indexes which requires fast response times also under
heavy writes.

That's also just a RoT and it can obviously be stretched and efficiently
so, but definitely not to 230GB on a 16GB server even when there's no
aggregation involved.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Mon, Jul 14, 2014 at 4:06 PM, Ophir Michaeli ophirmichaeli@gmail.com
wrote:

Is there an optimal ratio between index disk size and ram?

On Monday, July 14, 2014 12:44:14 PM UTC+3, Michael McCandless wrote:

Also, optimize is an incredibly costly (CPU, IO) operation. Really, it
should only be done when you know the index will no longer change, e.g.
when the daily log index is done being written.

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 13, 2014 at 9:26 AM, Itamar Syn-Hershko ita...@code972.com
wrote:

Because Elasticsearch will usually get the merge policy right, and you
are better off not trying to fine tune it yourself.

Based on those numbers, I'd say you should add more servers if not RAM.
230GB on 16GB servers is going to cause a lot of thrashing definitely if
you are doing a lot of aggregation operations (aka faceting)

You probably can find ways to fine tune and squeeze more performance out
of what you currently have (again - using filters, codecs and other
advanced configs) but it's probably just wiser to scale out

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 4:20 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?

On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:

By shard size I meant on disk

There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really

Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:59 PM, Ophir Michaeli ophirm...@gmail.com
wrote:

I got to 5 by doing some performance tests, could be that 1 or 10
are also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs,
one node on a machine).
Each server is Server 2008 R2, 16GB Ram.

On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:

How did you arrive at this number of 5?

To being with, what sizes are your shards? what are the specs of
your servers?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Jul 13, 2014 at 3:11 PM, Ophir Michaeli <ophirm...@gmail.com

wrote:

Hi everyone,

I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search
while indexing is too slow), meaning running optimize every given time on
the delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.

Best Regards,
Ophir

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667
a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b33ae8c6-667a-45cd-8b1e-e7d42bb8e99e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/9413947b-5ce8-493c-bfe5-e297d1b48879%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/33a7679b-3d83-4a8c-9c30-6a2a86411a75%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAHTr4ZusmBRFVA%3DuOm4EgynMYgD5nmoWSirnsrQvGbc
7m53PpQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZusmBRFVA%3DuOm4EgynMYgD5nmoWSirnsrQvGbc7m53PpQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/16d66a8d-9e25-492f-9c9b-953700ed1239%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/16d66a8d-9e25-492f-9c9b-953700ed1239%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt83oQmXj09j5oR8umFp7Q7hMwdrJ2DspoE1jZh2BHR6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #11