Hardware recommendations

Hello,

Is there some hardware recomendations for ES?
It seems that it's very difficult to evaluate and really dependant of the
use case.
But is there some well-known bottle necks or typical configuration? I/O or
CPU ...
My use case is pretty facets intensive.

Julien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

In general, memory is never wasted. If you are doing heavy faceting, geo
or sorting, memory will probably be your limiting factor. And even when
you aren't faceting, more memory will mean more segments of your index are
cached into the file system cache (which means more diskless operations).

It's hard to make a blanket statement about CPU and disk I/O. More is
always better, but how much you need is impossible to determine without
testing. CPU is used heavily by some search types, as well as during
segment merges while indexing. Disk I/O is a bottleneck during heavy
indexing, shard relocation and potentially some searches/facets if you hit
all the documents at once.

With regard to disk, if you RAID, use RAID in a performance mode (striping)
rather than for availability....you don't need HA because you can use
replicas. If you can get SSDs, they are a huge boost in performance over
spinning disks.

Lastly, "medium" to "large" boxes tend to work better than "small" boxes.
Those are quoted for a reason, because it's hard to say what is "large"
for a given context. But it tends to be more economical to start with
medium/large machines and then start scaling out.

-Zach

On Tuesday, September 17, 2013 8:14:21 AM UTC-4, Julien Naour wrote:

Hello,

Is there some hardware recomendations for ES?
It seems that it's very difficult to evaluate and really dependant of the
use case.
But is there some well-known bottle necks or typical configuration? I/O or
CPU ...
My use case is pretty facets intensive.

Julien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for these advices.

Julien

2013/9/17 Zachary Tong zacharyjtong@gmail.com

In general, memory is never wasted. If you are doing heavy faceting, geo
or sorting, memory will probably be your limiting factor. And even when
you aren't faceting, more memory will mean more segments of your index are
cached into the file system cache (which means more diskless operations).

It's hard to make a blanket statement about CPU and disk I/O. More is
always better, but how much you need is impossible to determine without
testing. CPU is used heavily by some search types, as well as during
segment merges while indexing. Disk I/O is a bottleneck during heavy
indexing, shard relocation and potentially some searches/facets if you hit
all the documents at once.

With regard to disk, if you RAID, use RAID in a performance mode
(striping) rather than for availability....you don't need HA because you
can use replicas. If you can get SSDs, they are a huge boost in
performance over spinning disks.

Lastly, "medium" to "large" boxes tend to work better than "small" boxes.
Those are quoted for a reason, because it's hard to say what is "large"
for a given context. But it tends to be more economical to start with
medium/large machines and then start scaling out.

-Zach

On Tuesday, September 17, 2013 8:14:21 AM UTC-4, Julien Naour wrote:

Hello,

Is there some hardware recomendations for ES?
It seems that it's very difficult to evaluate and really dependant of the
use case.
But is there some well-known bottle necks or typical configuration? I/O
or CPU ...
My use case is pretty facets intensive.

Julien

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/SoKsyafneUU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sure thing. This isn't an official recommendation, but I personally would
not want to use a machine that has less than 16Gb of RAM. If you happen to
have a light data-set then you'll have plenty of space to grow, and if you
have a heavy data-set then this isn't an unreasonable starting place.

32Gb of RAM seems to be the sweet-spot and is what I would personally build
a cluster out of.

Having said that, nothing beats a few good benchmarks to figure out what
fits your data! =)
-Zach

On Tuesday, September 17, 2013 9:21:04 AM UTC-4, Julien Naour wrote:

Thanks for these advices.

Julien

2013/9/17 Zachary Tong <zachar...@gmail.com <javascript:>>

In general, memory is never wasted. If you are doing heavy faceting, geo
or sorting, memory will probably be your limiting factor. And even when
you aren't faceting, more memory will mean more segments of your index are
cached into the file system cache (which means more diskless operations).

It's hard to make a blanket statement about CPU and disk I/O. More is
always better, but how much you need is impossible to determine without
testing. CPU is used heavily by some search types, as well as during
segment merges while indexing. Disk I/O is a bottleneck during heavy
indexing, shard relocation and potentially some searches/facets if you hit
all the documents at once.

With regard to disk, if you RAID, use RAID in a performance mode
(striping) rather than for availability....you don't need HA because you
can use replicas. If you can get SSDs, they are a huge boost in
performance over spinning disks.

Lastly, "medium" to "large" boxes tend to work better than "small" boxes.
Those are quoted for a reason, because it's hard to say what is "large"
for a given context. But it tends to be more economical to start with
medium/large machines and then start scaling out.

-Zach

On Tuesday, September 17, 2013 8:14:21 AM UTC-4, Julien Naour wrote:

Hello,

Is there some hardware recomendations for ES?
It seems that it's very difficult to evaluate and really dependant of
the use case.
But is there some well-known bottle necks or typical configuration? I/O
or CPU ...
My use case is pretty facets intensive.

Julien

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/SoKsyafneUU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Zach,
Could you define light data-set please?

We have a 4 machine cluster of 3.7GB memory each, and we have around 8 open
indexes of 25GB each with a replication factor of 2 and 2 shards.
We get java heap space out of memory errors quite regularly. Could you
suggest h/w configurations for this scenario, and number of indexing will
be increasing in future, anything that would be a stable solution in long
term.

On Tuesday, September 17, 2013 6:58:46 PM UTC+5:30, Zachary Tong wrote:

Sure thing. This isn't an official recommendation, but I personally would
not want to use a machine that has less than 16Gb of RAM. If you happen to
have a light data-set then you'll have plenty of space to grow, and if you
have a heavy data-set then this isn't an unreasonable starting place.

32Gb of RAM seems to be the sweet-spot and is what I would personally
build a cluster out of.

Having said that, nothing beats a few good benchmarks to figure out what
fits your data! =)
-Zach

On Tuesday, September 17, 2013 9:21:04 AM UTC-4, Julien Naour wrote:

Thanks for these advices.

Julien

2013/9/17 Zachary Tong zachar...@gmail.com

In general, memory is never wasted. If you are doing heavy faceting,
geo or sorting, memory will probably be your limiting factor. And even
when you aren't faceting, more memory will mean more segments of your index
are cached into the file system cache (which means more diskless
operations).

It's hard to make a blanket statement about CPU and disk I/O. More is
always better, but how much you need is impossible to determine without
testing. CPU is used heavily by some search types, as well as during
segment merges while indexing. Disk I/O is a bottleneck during heavy
indexing, shard relocation and potentially some searches/facets if you hit
all the documents at once.

With regard to disk, if you RAID, use RAID in a performance mode
(striping) rather than for availability....you don't need HA because you
can use replicas. If you can get SSDs, they are a huge boost in
performance over spinning disks.

Lastly, "medium" to "large" boxes tend to work better than "small"
boxes. Those are quoted for a reason, because it's hard to say what is
"large" for a given context. But it tends to be more economical to start
with medium/large machines and then start scaling out.

-Zach

On Tuesday, September 17, 2013 8:14:21 AM UTC-4, Julien Naour wrote:

Hello,

Is there some hardware recomendations for ES?
It seems that it's very difficult to evaluate and really dependant of
the use case.
But is there some well-known bottle necks or typical configuration? I/O
or CPU ...
My use case is pretty facets intensive.

Julien

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/SoKsyafneUU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I would agree with Zachary, usually memory is the best way to increase
performance. One interesting thing we noticed: dont give all the memory to
es, sometimes you get a good speedup if you leave memory for system file
cache. The moment you have a lot of disk IO SSD helps (speed up for search
by factor 2 or using SSD). But in my eyes the major issue is trying to
avoid disk io. So if you have a monitoring tool displaying disk IO play
around and watch how many disk IO you have during a certain period of time.
If you have enough memory it goes down to almost zero. Also helpful is a
tool like the HQ plugin, here you can see how long search and fetch takes
in average, so draw your conclusions.

Cheers,
Andrej

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

"Light data-set" is nearly impossible to define. It's a combination of the
size of your machine, the size/complexity of your data and the types of
queries you are performing. If you only have one gigabyte of memory, you
can still execute simple search across millions of documents, but be unable
to facet across the whole data set due to memory constraints. So for
search, it's a "light" data-set. But for faceting, it is "heavy".

If your nodes are OOM often, by definition the data-set is "too heavy" for
your given hardware/data/query combination. You'll need to add more
memory, more nodes or re-evaluate the facets you are running.

Definitely agree with what Andrej says too: the general rule of thumb is
leave around 50% of the available memory for file system cache.

-Zach

On Tuesday, September 17, 2013 10:47:22 AM UTC-4, Sambhav Sharma wrote:

Hi Zach,
Could you define light data-set please?

We have a 4 machine cluster of 3.7GB memory each, and we have around 8
open indexes of 25GB each with a replication factor of 2 and 2 shards.
We get java heap space out of memory errors quite regularly. Could you
suggest h/w configurations for this scenario, and number of indexing will
be increasing in future, anything that would be a stable solution in long
term.

On Tuesday, September 17, 2013 6:58:46 PM UTC+5:30, Zachary Tong wrote:

Sure thing. This isn't an official recommendation, but I personally
would not want to use a machine that has less than 16Gb of RAM. If you
happen to have a light data-set then you'll have plenty of space to grow,
and if you have a heavy data-set then this isn't an unreasonable starting
place.

32Gb of RAM seems to be the sweet-spot and is what I would personally
build a cluster out of.

Having said that, nothing beats a few good benchmarks to figure out what
fits your data! =)
-Zach

On Tuesday, September 17, 2013 9:21:04 AM UTC-4, Julien Naour wrote:

Thanks for these advices.

Julien

2013/9/17 Zachary Tong zachar...@gmail.com

In general, memory is never wasted. If you are doing heavy faceting,
geo or sorting, memory will probably be your limiting factor. And even
when you aren't faceting, more memory will mean more segments of your index
are cached into the file system cache (which means more diskless
operations).

It's hard to make a blanket statement about CPU and disk I/O. More is
always better, but how much you need is impossible to determine without
testing. CPU is used heavily by some search types, as well as during
segment merges while indexing. Disk I/O is a bottleneck during heavy
indexing, shard relocation and potentially some searches/facets if you hit
all the documents at once.

With regard to disk, if you RAID, use RAID in a performance mode
(striping) rather than for availability....you don't need HA because you
can use replicas. If you can get SSDs, they are a huge boost in
performance over spinning disks.

Lastly, "medium" to "large" boxes tend to work better than "small"
boxes. Those are quoted for a reason, because it's hard to say what is
"large" for a given context. But it tends to be more economical to start
with medium/large machines and then start scaling out.

-Zach

On Tuesday, September 17, 2013 8:14:21 AM UTC-4, Julien Naour wrote:

Hello,

Is there some hardware recomendations for ES?
It seems that it's very difficult to evaluate and really dependant of
the use case.
But is there some well-known bottle necks or typical configuration?
I/O or CPU ...
My use case is pretty facets intensive.

Julien

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/SoKsyafneUU/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

There is no bottle neck / practical limit I am aware of - well, if you
combine more than about a few thousand machines into a single cluster I'm
sure it will get pretty tight because of the cluster state propagation :slight_smile:

If you want facets, check out for enough RAM. You can put much RAM into
less machines (accepting less CPU power per RAM) or add more machines
(scaling RAM with CPU power).

To determine overall RAM, pick a single machine and check what of your
facets (it depends on cardinality) and how many of your facets you can use,
until you get OOM. Then you can estimate how much machines you will need at
all. It's simple as that.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.