I'm running a system of 2 nodes with 66 million documents each (additional
nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good performance.
I'm running a system of 2 nodes with 66 million documents each (additional
nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good performance.
I got to 5 by doing some performance tests, could be that 1 or 10 are also
ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one node
on a machine).
Each server is Server 2008 R2, 16GB Ram.
On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:
How did you arrive at this number of 5?
To being with, what sizes are your shards? what are the specs of your
servers?
On Sun, Jul 13, 2014 at 3:11 PM, Ophir Michaeli <ophirm...@gmail.com
<javascript:>> wrote:
Hi everyone,
I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.
There's a lot you can do to optimize performance, worrying about the number
of segments is the last of them really
Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments
I got to 5 by doing some performance tests, could be that 1 or 10 are also
ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.
On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:
How did you arrive at this number of 5?
To being with, what sizes are your shards? what are the specs of your
servers?
I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.
Best Regards,
Ophir
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?
On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:
By shard size I meant on disk
There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really
Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments
On Sun, Jul 13, 2014 at 3:59 PM, Ophir Michaeli <ophirm...@gmail.com
<javascript:>> wrote:
I got to 5 by doing some performance tests, could be that 1 or 10 are
also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.
On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:
How did you arrive at this number of 5?
To being with, what sizes are your shards? what are the specs of your
servers?
I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.
Best Regards,
Ophir
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
Because Elasticsearch will usually get the merge policy right, and you are
better off not trying to fine tune it yourself.
Based on those numbers, I'd say you should add more servers if not RAM.
230GB on 16GB servers is going to cause a lot of thrashing definitely if
you are doing a lot of aggregation operations (aka faceting)
You probably can find ways to fine tune and squeeze more performance out of
what you currently have (again - using filters, codecs and other advanced
configs) but it's probably just wiser to scale out
Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?
On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:
By shard size I meant on disk
There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really
Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments
I got to 5 by doing some performance tests, could be that 1 or 10 are
also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.
On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:
How did you arrive at this number of 5?
To being with, what sizes are your shards? what are the specs of your
servers?
I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search while
indexing is too slow), meaning running optimize every given time on the
delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.
Best Regards,
Ophir
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
Also, optimize is an incredibly costly (CPU, IO) operation. Really, it
should only be done when you know the index will no longer change, e.g.
when the daily log index is done being written.
On Sun, Jul 13, 2014 at 9:26 AM, Itamar Syn-Hershko itamar@code972.com
wrote:
Because Elasticsearch will usually get the merge policy right, and you are
better off not trying to fine tune it yourself.
Based on those numbers, I'd say you should add more servers if not RAM.
230GB on 16GB servers is going to cause a lot of thrashing definitely if
you are doing a lot of aggregation operations (aka faceting)
You probably can find ways to fine tune and squeeze more performance out
of what you currently have (again - using filters, codecs and other
advanced configs) but it's probably just wiser to scale out
Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?
On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:
By shard size I meant on disk
There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really
Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments
I got to 5 by doing some performance tests, could be that 1 or 10 are
also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.
On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:
How did you arrive at this number of 5?
To being with, what sizes are your shards? what are the specs of your
servers?
I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search
while indexing is too slow), meaning running optimize every given time on
the delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.
Best Regards,
Ophir
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
On Monday, July 14, 2014 12:44:14 PM UTC+3, Michael McCandless wrote:
Also, optimize is an incredibly costly (CPU, IO) operation. Really, it
should only be done when you know the index will no longer change, e.g.
when the daily log index is done being written.
Is there an optimal ratio between index disk size and ram?
On Sun, Jul 13, 2014 at 9:26 AM, Itamar Syn-Hershko <ita...@code972.com
<javascript:>> wrote:
Because Elasticsearch will usually get the merge policy right, and you
are better off not trying to fine tune it yourself.
Based on those numbers, I'd say you should add more servers if not RAM.
230GB on 16GB servers is going to cause a lot of thrashing definitely if
you are doing a lot of aggregation operations (aka faceting)
You probably can find ways to fine tune and squeeze more performance out
of what you currently have (again - using filters, codecs and other
advanced configs) but it's probably just wiser to scale out
On Sun, Jul 13, 2014 at 4:20 PM, Ophir Michaeli <ophirm...@gmail.com
<javascript:>> wrote:
Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?
On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:
By shard size I meant on disk
There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really
Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments
I got to 5 by doing some performance tests, could be that 1 or 10
are also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.
On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:
How did you arrive at this number of 5?
To being with, what sizes are your shards? what are the specs of your
servers?
I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search
while indexing is too slow), meaning running optimize every given time on
the delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.
Best Regards,
Ophir
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
Is there an optimal ratio between index disk size and ram?
On Monday, July 14, 2014 12:44:14 PM UTC+3, Michael McCandless wrote:
Also, optimize is an incredibly costly (CPU, IO) operation. Really, it
should only be done when you know the index will no longer change, e.g.
when the daily log index is done being written.
On Sun, Jul 13, 2014 at 9:26 AM, Itamar Syn-Hershko <ita...@code972.com
<javascript:>> wrote:
Because Elasticsearch will usually get the merge policy right, and you
are better off not trying to fine tune it yourself.
Based on those numbers, I'd say you should add more servers if not RAM.
230GB on 16GB servers is going to cause a lot of thrashing definitely if
you are doing a lot of aggregation operations (aka faceting)
You probably can find ways to fine tune and squeeze more performance out
of what you currently have (again - using filters, codecs and other
advanced configs) but it's probably just wiser to scale out
On Sun, Jul 13, 2014 at 4:20 PM, Ophir Michaeli <ophirm...@gmail.com
<javascript:>> wrote:
Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?
On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:
By shard size I meant on disk
There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really
Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments
I got to 5 by doing some performance tests, could be that 1 or 10
are also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs, one
node on a machine).
Each server is Server 2008 R2, 16GB Ram.
On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:
How did you arrive at this number of 5?
To being with, what sizes are your shards? what are the specs of your
servers?
I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search
while indexing is too slow), meaning running optimize every given time on
the delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.
Best Regards,
Ophir
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
Again, it highly depends on your data and your usage.
My rule of thumb is to have enough memory to hold the entire active
indexes. So if you have a 16GB server, that means Elasticsearch gets 8GB,
that means you can deal with indexes of size 8GB in total. The OS will use
the memory assigned to it to mmap the index files, and Elasticsearch will
use it's 8GB to efficiently load field data and perform indexing. This
refers to highly used indexes which requires fast response times also under
heavy writes.
That's also just a RoT and it can obviously be stretched and efficiently
so, but definitely not to 230GB on a 16GB server even when there's no
aggregation involved.
Is there an optimal ratio between index disk size and ram?
On Monday, July 14, 2014 12:44:14 PM UTC+3, Michael McCandless wrote:
Also, optimize is an incredibly costly (CPU, IO) operation. Really, it
should only be done when you know the index will no longer change, e.g.
when the daily log index is done being written.
On Sun, Jul 13, 2014 at 9:26 AM, Itamar Syn-Hershko ita...@code972.com
wrote:
Because Elasticsearch will usually get the merge policy right, and you
are better off not trying to fine tune it yourself.
Based on those numbers, I'd say you should add more servers if not RAM.
230GB on 16GB servers is going to cause a lot of thrashing definitely if
you are doing a lot of aggregation operations (aka faceting)
You probably can find ways to fine tune and squeeze more performance out
of what you currently have (again - using filters, codecs and other
advanced configs) but it's probably just wiser to scale out
Shard size on disk is 115GB (230GB for both).
Adding ram is not an option now, I got good results when the index was
optimized, why not try optimize the delta (for example optimize each added
million docs each time, if that is too expensive than half a million and so
on)?
On Sunday, July 13, 2014 4:06:51 PM UTC+3, Itamar Syn-Hershko wrote:
By shard size I meant on disk
There's a lot you can do to optimize performance, worrying about the
number of segments is the last of them really
Look into getting more RAM (32gb is my personal recommendation), using
filters, making sure you use enough servers (50GB shards on 16GB RAM server
isn't cool, especially if you use aggregations), look into codecs and much
more. There's no need for you to look into segments, especially since if
this is a live index which is being written to there's a large cost (CPU,
IO and GC) associated with merging segments
I got to 5 by doing some performance tests, could be that 1 or 10
are also ok.
Each shard is 33 Million documents (2 shards on a 66 Million docs,
one node on a machine).
Each server is Server 2008 R2, 16GB Ram.
On Sunday, July 13, 2014 3:15:57 PM UTC+3, Itamar Syn-Hershko wrote:
How did you arrive at this number of 5?
To being with, what sizes are your shards? what are the specs of
your servers?
I'm running a system of 2 nodes with 66 million documents each
(additional nodes will be added up to a total of 500 Million documents).
I want to keep the number of segments to 5 (otherwise the search
while indexing is too slow), meaning running optimize every given time on
the delta, while indexing and search still running.
Is this a good practice? Or are there better ideas for a good
performance.
Best Regards,
Ophir
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.