Design HA ES for 16 TB logs data | Is SAN storage a good idea?


(sirkubax) #1

Hi,

I'm testing/planning implementation for 16 TB data logs (1 month, daily
indexes about 530GB/day). Indexes are deleted after 1 month (TTL is 1
month).

The documents size vary from few bytes to 1MB (average of ~3 kb).

We have 2 data center, and the requirement is to provide access to dataset
when one is down.

My current implementation looks like this:

cluster.routing.allocation.awareness.attributes: datacenter

cluster.routing.allocation.awareness.force.datacenter.values:
datacenterA,datacenterB

So the indexes are located on nodes in datacenterA and datacenterB. There
is 1 replica for each index, so the index/replica is balanced between
locations.

The problem A:

I have been offered a SAN storage space that could be provided to any of ES
node machines. Now, it index/replica scenario, I need 2 * 16 TB = 32 TB
disk storage. If in raid1, it makes 64TB "real world" disk storage.

Providing "independent, high quality" storage may (if ES would allow)
reduce the size to required 16TB. I said "if ES would allow", because up to
my current knowledge, nodes can not "share" dataset. If many nodes run on a
common storage, they create own, unique path. Is that correct?

Could I run ES cluster where indexes have no replica, but still, nodeX
failure does not affect accessibility of nodeXdataset to the Cluster?

In my current idea of indexes without replica scenario, powering off (or
failure) of the "NodeXDatacenterA" would make datasetX unavailable to read
in cluster, at least until I start NodeXDatacenterB that would have access
to datasetX (the same path configuration). Of course NodeXDatacenterA and
NodeXDatacenterB could not run both in the same time.

I just guess, that workaround suggested above is not "in the ES philosophy
of shared storage and self-balancing". It would make upgrade of single node
problematic, less fault-tolerant, etc.

Facts that makes me think about this solution is, that I have available
some "24-core, 64GH Ram, limited disk storage" machines and a 16TB SAN
storage that I could mount to that machines.

Do You have any suggestion of SAN storage usage? Is that a good idea at
all?

The problem B: Design

My current idea of building the environment is to order N (6-8? or more)
machines with big HDD's and run "normal ES cluster" with shards and
replicas stored locally.

The question is: how many of them would be enough :slight_smile:

Providing 24-core,64GB RAM and 4TB each it would make 4 machines to run
minimal cluster settings in single Datacenter, and 8 machines total for
both datacenters. What do you think about possible performance.

Actually to be storage-safe I would go for 6-8 TB disk storage per machine.
That would allow to run on "less than 4" nodes while operation in single
datacenter.

I wonder if 64GB RAM would be enough.

The whole process of acquiring new servers takes time - is there a "good
practise" guide to determine minimum number of servers in the cluster?

How many shards would You suggest?

Question C:

I have seen some performance advices to make "client" ES nodes as a machine
without data storage so it would not suffer from I/O issues. If having 2 of
them, how would you scale it?

Do you think it's worth having 2 client-only machines, or better 2 more
"complete" nodes with data storage, as extra nodes to ES cluster (so 10
instead of 8 nodes).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

A. There are many unknown factors regarding "SAN storage", e.g. how is the
latency and the IOPS? Most of SAN are black boxes and do not scale over the
number of connected hosts, so you should test it thoroughly to make an
educated decision. There is no simple "yes" or "no". As a matter of fact, I
would never use SAN, only local storage, because SAN comes with the risk of
being bottleneck.

B. No matter what specifications, you should test your configuration first
if it fits your performance requirements, there is no "yes" or "no". The
minimum number of nodes is 3 to avoid situations like split brain.

C. You should expect more throughput if you can decouple client workload
from server workload, but that also depends on your workload pattern and
your tests. For example if you must preprocess data before indexing, or
postprocess search results, you will welcome additional nodes as a great
help.

Jörg

On Sun, Aug 3, 2014 at 9:26 PM, sirkubax jakubxmuszynski@googlemail.com
wrote:

Hi,

I'm testing/planning implementation for 16 TB data logs (1 month, daily
indexes about 530GB/day). Indexes are deleted after 1 month (TTL is 1
month).

The documents size vary from few bytes to 1MB (average of ~3 kb).

We have 2 data center, and the requirement is to provide access to dataset
when one is down.

My current implementation looks like this:

cluster.routing.allocation.awareness.attributes: datacenter

cluster.routing.allocation.awareness.force.datacenter.values:
datacenterA,datacenterB

So the indexes are located on nodes in datacenterA and datacenterB. There
is 1 replica for each index, so the index/replica is balanced between
locations.

The problem A:

I have been offered a SAN storage space that could be provided to any of
ES node machines. Now, it index/replica scenario, I need 2 * 16 TB = 32 TB
disk storage. If in raid1, it makes 64TB "real world" disk storage.

Providing "independent, high quality" storage may (if ES would allow)
reduce the size to required 16TB. I said "if ES would allow", because up to
my current knowledge, nodes can not "share" dataset. If many nodes run on a
common storage, they create own, unique path. Is that correct?

Could I run ES cluster where indexes have no replica, but still, nodeX
failure does not affect accessibility of nodeXdataset to the Cluster?

In my current idea of indexes without replica scenario, powering off (or
failure) of the "NodeXDatacenterA" would make datasetX unavailable to read
in cluster, at least until I start NodeXDatacenterB that would have access
to datasetX (the same path configuration). Of course NodeXDatacenterA and
NodeXDatacenterB could not run both in the same time.

I just guess, that workaround suggested above is not "in the ES philosophy
of shared storage and self-balancing". It would make upgrade of single node
problematic, less fault-tolerant, etc.

Facts that makes me think about this solution is, that I have available
some "24-core, 64GH Ram, limited disk storage" machines and a 16TB SAN
storage that I could mount to that machines.

Do You have any suggestion of SAN storage usage? Is that a good idea at
all?

The problem B: Design

My current idea of building the environment is to order N (6-8? or more)
machines with big HDD's and run "normal ES cluster" with shards and
replicas stored locally.

The question is: how many of them would be enough :slight_smile:

Providing 24-core,64GB RAM and 4TB each it would make 4 machines to run
minimal cluster settings in single Datacenter, and 8 machines total for
both datacenters. What do you think about possible performance.

Actually to be storage-safe I would go for 6-8 TB disk storage per
machine. That would allow to run on "less than 4" nodes while operation in
single datacenter.

I wonder if 64GB RAM would be enough.

The whole process of acquiring new servers takes time - is there a "good
practise" guide to determine minimum number of servers in the cluster?

How many shards would You suggest?

Question C:

I have seen some performance advices to make "client" ES nodes as a
machine without data storage so it would not suffer from I/O issues. If
having 2 of them, how would you scale it?

Do you think it's worth having 2 client-only machines, or better 2 more
"complete" nodes with data storage, as extra nodes to ES cluster (so 10
instead of 8 nodes).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHk46WoM%3D4wLgrr8BhenbjUnTLgiK30PE2Kvx7C08mn6Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(John Cherniavsky) #3

SAN question aside - what are guidelines on the balance of CPU/RAM/Storage
so that no one thing is the obvious bottleneck.

I know it depends on workload, so

  • For aggregation heavy workloads, about how much RAM : Storage?

  • For high volume, but smaller queries (individual log retrieval), what's
    the right CPU : Storage for spinning disk? To much CPU and all the extra
    queries are waiting on the disks to return, too much disk and the CPU can't
    keep up (or does that never happen?)

Obviously every configuration is different - so does anyone have guidelines
or past experience?

On Sunday, August 3, 2014 1:49:09 PM UTC-7, Jörg Prante wrote:

A. There are many unknown factors regarding "SAN storage", e.g. how is the
latency and the IOPS? Most of SAN are black boxes and do not scale over the
number of connected hosts, so you should test it thoroughly to make an
educated decision. There is no simple "yes" or "no". As a matter of fact, I
would never use SAN, only local storage, because SAN comes with the risk of
being bottleneck.

B. No matter what specifications, you should test your configuration first
if it fits your performance requirements, there is no "yes" or "no". The
minimum number of nodes is 3 to avoid situations like split brain.

C. You should expect more throughput if you can decouple client workload
from server workload, but that also depends on your workload pattern and
your tests. For example if you must preprocess data before indexing, or
postprocess search results, you will welcome additional nodes as a great
help.

Jörg

On Sun, Aug 3, 2014 at 9:26 PM, sirkubax <jakubxm...@googlemail.com
<javascript:>> wrote:

Hi,

I'm testing/planning implementation for 16 TB data logs (1 month, daily
indexes about 530GB/day). Indexes are deleted after 1 month (TTL is 1
month).

The documents size vary from few bytes to 1MB (average of ~3 kb).

We have 2 data center, and the requirement is to provide access to
dataset when one is down.

My current implementation looks like this:

cluster.routing.allocation.awareness.attributes: datacenter

cluster.routing.allocation.awareness.force.datacenter.values:
datacenterA,datacenterB

So the indexes are located on nodes in datacenterA and datacenterB. There
is 1 replica for each index, so the index/replica is balanced between
locations.

The problem A:

I have been offered a SAN storage space that could be provided to any
of ES node machines. Now, it index/replica scenario, I need 2 * 16 TB = 32
TB disk storage. If in raid1, it makes 64TB "real world" disk storage.

Providing "independent, high quality" storage may (if ES would allow)
reduce the size to required 16TB. I said "if ES would allow", because up to
my current knowledge, nodes can not "share" dataset. If many nodes run on a
common storage, they create own, unique path. Is that correct?

Could I run ES cluster where indexes have no replica, but still, nodeX
failure does not affect accessibility of nodeXdataset to the Cluster?

In my current idea of indexes without replica scenario, powering off (or
failure) of the "NodeXDatacenterA" would make datasetX unavailable to read
in cluster, at least until I start NodeXDatacenterB that would have access
to datasetX (the same path configuration). Of course NodeXDatacenterA and
NodeXDatacenterB could not run both in the same time.

I just guess, that workaround suggested above is not "in the ES
philosophy of shared storage and self-balancing". It would make upgrade of
single node problematic, less fault-tolerant, etc.

Facts that makes me think about this solution is, that I have available
some "24-core, 64GH Ram, limited disk storage" machines and a 16TB SAN
storage that I could mount to that machines.

Do You have any suggestion of SAN storage usage? Is that a good idea at
all?

The problem B: Design

My current idea of building the environment is to order N (6-8? or
more) machines with big HDD's and run "normal ES cluster" with shards and
replicas stored locally.

The question is: how many of them would be enough :slight_smile:

Providing 24-core,64GB RAM and 4TB each it would make 4 machines to run
minimal cluster settings in single Datacenter, and 8 machines total for
both datacenters. What do you think about possible performance.

Actually to be storage-safe I would go for 6-8 TB disk storage per
machine. That would allow to run on "less than 4" nodes while operation in
single datacenter.

I wonder if 64GB RAM would be enough.

The whole process of acquiring new servers takes time - is there a "good
practise" guide to determine minimum number of servers in the cluster?

How many shards would You suggest?

Question C:

I have seen some performance advices to make "client" ES nodes as a
machine without data storage so it would not suffer from I/O issues. If
having 2 of them, how would you scale it?

Do you think it's worth having 2 client-only machines, or better 2 more
"complete" nodes with data storage, as extra nodes to ES cluster (so 10
instead of 8 nodes).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/67fdd9d5-9c5e-4c4d-af97-5657f024d510%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #4

Heavy aggregations = lots of ram
Storage, if you can use SSD.

The only rule of thumb is get the best possible hardware that you can
afford.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 4 August 2014 13:09, John Cherniavsky jchernia@gmail.com wrote:

SAN question aside - what are guidelines on the balance of CPU/RAM/Storage
so that no one thing is the obvious bottleneck.

I know it depends on workload, so

  • For aggregation heavy workloads, about how much RAM : Storage?

  • For high volume, but smaller queries (individual log retrieval), what's
    the right CPU : Storage for spinning disk? To much CPU and all the extra
    queries are waiting on the disks to return, too much disk and the CPU can't
    keep up (or does that never happen?)

Obviously every configuration is different - so does anyone have
guidelines or past experience?

On Sunday, August 3, 2014 1:49:09 PM UTC-7, Jörg Prante wrote:

A. There are many unknown factors regarding "SAN storage", e.g. how is
the latency and the IOPS? Most of SAN are black boxes and do not scale over
the number of connected hosts, so you should test it thoroughly to make an
educated decision. There is no simple "yes" or "no". As a matter of fact, I
would never use SAN, only local storage, because SAN comes with the risk of
being bottleneck.

B. No matter what specifications, you should test your configuration
first if it fits your performance requirements, there is no "yes" or "no".
The minimum number of nodes is 3 to avoid situations like split brain.

C. You should expect more throughput if you can decouple client workload
from server workload, but that also depends on your workload pattern and
your tests. For example if you must preprocess data before indexing, or
postprocess search results, you will welcome additional nodes as a great
help.

Jörg

On Sun, Aug 3, 2014 at 9:26 PM, sirkubax jakubxm...@googlemail.com
wrote:

Hi,

I'm testing/planning implementation for 16 TB data logs (1 month, daily
indexes about 530GB/day). Indexes are deleted after 1 month (TTL is 1
month).

The documents size vary from few bytes to 1MB (average of ~3 kb).

We have 2 data center, and the requirement is to provide access to
dataset when one is down.

My current implementation looks like this:

cluster.routing.allocation.awareness.attributes: datacenter

cluster.routing.allocation.awareness.force.datacenter.values:
datacenterA,datacenterB

So the indexes are located on nodes in datacenterA and datacenterB.
There is 1 replica for each index, so the index/replica is balanced
between locations.

The problem A:

I have been offered a SAN storage space that could be provided to any
of ES node machines. Now, it index/replica scenario, I need 2 * 16 TB = 32
TB disk storage. If in raid1, it makes 64TB "real world" disk storage.

Providing "independent, high quality" storage may (if ES would allow)
reduce the size to required 16TB. I said "if ES would allow", because up to
my current knowledge, nodes can not "share" dataset. If many nodes run on a
common storage, they create own, unique path. Is that correct?

Could I run ES cluster where indexes have no replica, but still, nodeX
failure does not affect accessibility of nodeXdataset to the Cluster?

In my current idea of indexes without replica scenario, powering off (or
failure) of the "NodeXDatacenterA" would make datasetX unavailable to read
in cluster, at least until I start NodeXDatacenterB that would have access
to datasetX (the same path configuration). Of course NodeXDatacenterA and
NodeXDatacenterB could not run both in the same time.

I just guess, that workaround suggested above is not "in the ES
philosophy of shared storage and self-balancing". It would make upgrade of
single node problematic, less fault-tolerant, etc.

Facts that makes me think about this solution is, that I have
available some "24-core, 64GH Ram, limited disk storage" machines and a
16TB SAN storage that I could mount to that machines.

Do You have any suggestion of SAN storage usage? Is that a good idea
at all?

The problem B: Design

My current idea of building the environment is to order N (6-8? or
more) machines with big HDD's and run "normal ES cluster" with shards and
replicas stored locally.

The question is: how many of them would be enough :slight_smile:

Providing 24-core,64GB RAM and 4TB each it would make 4 machines to run
minimal cluster settings in single Datacenter, and 8 machines total for
both datacenters. What do you think about possible performance.

Actually to be storage-safe I would go for 6-8 TB disk storage per
machine. That would allow to run on "less than 4" nodes while operation in
single datacenter.

I wonder if 64GB RAM would be enough.

The whole process of acquiring new servers takes time - is there a "good
practise" guide to determine minimum number of servers in the cluster?

How many shards would You suggest?

Question C:

I have seen some performance advices to make "client" ES nodes as a
machine without data storage so it would not suffer from I/O issues. If
having 2 of them, how would you scale it?

Do you think it's worth having 2 client-only machines, or better 2 more
"complete" nodes with data storage, as extra nodes to ES cluster (so 10
instead of 8 nodes).

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/67fdd9d5-9c5e-4c4d-af97-5657f024d510%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/67fdd9d5-9c5e-4c4d-af97-5657f024d510%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624a3zc%2BHub9fTwV305-6sDShTgBXHpoewutLZH6uR4Fucw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #5

Your question is about a relation "RAM amount / Storage capacity". There is
no answer "yes" or "no", because

  1. You can combine even tiny RAM with terabytes of storage, or tiny storage
    with huge RAM. ES will work - but what is it good for? The question is what
    query types you use and what is the expected query load. Aggregation heavy
    workload also depends on the number and type of documents and type of
    aggregations, so you must test each aggregation query, if memory footprint
    and resources are sufficient.

  2. The more RAM you add, the more you can assign to search and aggregation.
    This is a question of setup, how you want to balance ES resources, how
    often you update indexes etc. Maybe search is not everything, indexing also
    counts.

  3. The more storage you add, the better for indexing large number of docs.
    Adding storage does not improve overall performance. If you add slow
    storage with spinning disks, you have to calculate longer indexing times,
    and if you fetch documents, you have to calculate higher response times. If
    you want maximum speed, go for SSD.

With just commodity hardware, you can get very good performance out of ES.

Jörg

On Mon, Aug 4, 2014 at 5:09 AM, John Cherniavsky jchernia@gmail.com wrote:

SAN question aside - what are guidelines on the balance of CPU/RAM/Storage
so that no one thing is the obvious bottleneck.

I know it depends on workload, so

  • For aggregation heavy workloads, about how much RAM : Storage?

  • For high volume, but smaller queries (individual log retrieval), what's
    the right CPU : Storage for spinning disk? To much CPU and all the extra
    queries are waiting on the disks to return, too much disk and the CPU can't
    keep up (or does that never happen?)

Obviously every configuration is different - so does anyone have
guidelines or past experience?

On Sunday, August 3, 2014 1:49:09 PM UTC-7, Jörg Prante wrote:

A. There are many unknown factors regarding "SAN storage", e.g. how is
the latency and the IOPS? Most of SAN are black boxes and do not scale over
the number of connected hosts, so you should test it thoroughly to make an
educated decision. There is no simple "yes" or "no". As a matter of fact, I
would never use SAN, only local storage, because SAN comes with the risk of
being bottleneck.

B. No matter what specifications, you should test your configuration
first if it fits your performance requirements, there is no "yes" or "no".
The minimum number of nodes is 3 to avoid situations like split brain.

C. You should expect more throughput if you can decouple client workload
from server workload, but that also depends on your workload pattern and
your tests. For example if you must preprocess data before indexing, or
postprocess search results, you will welcome additional nodes as a great
help.

Jörg

On Sun, Aug 3, 2014 at 9:26 PM, sirkubax jakubxm...@googlemail.com
wrote:

Hi,

I'm testing/planning implementation for 16 TB data logs (1 month, daily
indexes about 530GB/day). Indexes are deleted after 1 month (TTL is 1
month).

The documents size vary from few bytes to 1MB (average of ~3 kb).

We have 2 data center, and the requirement is to provide access to
dataset when one is down.

My current implementation looks like this:

cluster.routing.allocation.awareness.attributes: datacenter

cluster.routing.allocation.awareness.force.datacenter.values:
datacenterA,datacenterB

So the indexes are located on nodes in datacenterA and datacenterB.
There is 1 replica for each index, so the index/replica is balanced
between locations.

The problem A:

I have been offered a SAN storage space that could be provided to any
of ES node machines. Now, it index/replica scenario, I need 2 * 16 TB = 32
TB disk storage. If in raid1, it makes 64TB "real world" disk storage.

Providing "independent, high quality" storage may (if ES would allow)
reduce the size to required 16TB. I said "if ES would allow", because up to
my current knowledge, nodes can not "share" dataset. If many nodes run on a
common storage, they create own, unique path. Is that correct?

Could I run ES cluster where indexes have no replica, but still, nodeX
failure does not affect accessibility of nodeXdataset to the Cluster?

In my current idea of indexes without replica scenario, powering off (or
failure) of the "NodeXDatacenterA" would make datasetX unavailable to read
in cluster, at least until I start NodeXDatacenterB that would have access
to datasetX (the same path configuration). Of course NodeXDatacenterA and
NodeXDatacenterB could not run both in the same time.

I just guess, that workaround suggested above is not "in the ES
philosophy of shared storage and self-balancing". It would make upgrade of
single node problematic, less fault-tolerant, etc.

Facts that makes me think about this solution is, that I have
available some "24-core, 64GH Ram, limited disk storage" machines and a
16TB SAN storage that I could mount to that machines.

Do You have any suggestion of SAN storage usage? Is that a good idea
at all?

The problem B: Design

My current idea of building the environment is to order N (6-8? or
more) machines with big HDD's and run "normal ES cluster" with shards and
replicas stored locally.

The question is: how many of them would be enough :slight_smile:

Providing 24-core,64GB RAM and 4TB each it would make 4 machines to run
minimal cluster settings in single Datacenter, and 8 machines total for
both datacenters. What do you think about possible performance.

Actually to be storage-safe I would go for 6-8 TB disk storage per
machine. That would allow to run on "less than 4" nodes while operation in
single datacenter.

I wonder if 64GB RAM would be enough.

The whole process of acquiring new servers takes time - is there a "good
practise" guide to determine minimum number of servers in the cluster?

How many shards would You suggest?

Question C:

I have seen some performance advices to make "client" ES nodes as a
machine without data storage so it would not suffer from I/O issues. If
having 2 of them, how would you scale it?

Do you think it's worth having 2 client-only machines, or better 2 more
"complete" nodes with data storage, as extra nodes to ES cluster (so 10
instead of 8 nodes).

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/67fdd9d5-9c5e-4c4d-af97-5657f024d510%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/67fdd9d5-9c5e-4c4d-af97-5657f024d510%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGwiV_xNh2Q%2BxvURV%3DxS3KdBV%3Djzbd_uL9Ff8%2BZUTtFqw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6