Recommendations needed for large ELK system design

Hello,

We wish to set up an entire ELK system with the following features:

  • Input from Logstash shippers located on 400 Linux VMs. Only a handful
    of log sources on each VM.
  • Data retention for 30 days, which is roughly 2TB of data in indexed ES
    JSON form (not including replica shards)
  • Estimated input data rate of 50 messages per second at peak hours.
    Mostly short or medium length one-line messages but there will be Java
    traces and very large service responses (in the form of XML) to deal with
    too.
  • The entire system would be on our company LAN.
  • The stored data will be a mix of application logs (info, errors etc)
    and server stats (CPU, memory usage etc) and would mostly be accessed
    through Kibana.

This is our current plan:

  • Have the LS shippers perform minimal parsing (but would do multiline).
    Have them point to two load-balanced servers containing Redis and LS
    indexers (which would do all parsing).
  • 2 replica shards for each index, which ramps the total data storage up
    to 6TB
  • ES cluster spread over 6 nodes. Each node is 1TB in size
  • LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly
appreciate the advice of someone with experience!

  1. Does the balance between the number of nodes, the number of replica
    shards, and storage size of each node seem about right? We use
    high-performance equipment and would expect minimal downtime.

  2. What is your recommendation for the system design of the LS indexers
    and Redis? I've seen various designs with each indexer assigned to a single
    Redis, or all indexers reading from all Redises.

  3. Leading from the previous question, what would your recommend data
    size for the Redis servers be?

  4. Not sure what to do about master/data nodes. Assuming all the nodes
    are on identical hardware would it be beneficial to have a node which is
    only a master which would only handle requests?

  5. Do we need to do any additional load balancing on the ES nodes?

We are open to any and all suggestions. We have not yet committed to any
particular design so can change if needed.

Thank you for your time and responses,
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

1 - Looks ok, but why two replicas? You're chewing up disk for what reason?
Extra comments below.
2 - It's personal preference really and depends on how your end points send
to redis.
3 - 4GB for redis will cache quite a lot of data if you're only doing 50
events p/s (ie hours or even days based on what I've seen).
4 - No, spread it out to all the nodes. More on that below though.
5 - No it will handle that itself. Again, more on that below though.

Suggestions;
Set your indexes to (factors of) 6 shards, ie one per node, it spreads
query performance. I say "factors of" in that you can set it to 12 shards
per index to start and easily scale the node count and still spread the
load.
Split your stats and your log data into different indexes, it'll make
management and retention easier.
You can consider a master only node or (ideally) three that also handle
queries.
Preferably have an uneven number of master eligible nodes, whether you make
them VMs or physicals, that way you can ensure quorum is reached with
minimal fuss and stop split brain.
If you use VMs for master + query nodes then you might want to look at load
balancing the queries via an external service.

To give you an idea, we have a 27 node cluster - 3 masters that also handle
queries and 24 data nodes. Masters are 8GB with small disks, data nodes are
60GB (30 heap) and 512GB disk.
We're running with one replica and have 11TB of logging data. At a high
level we're running out of disk more than heap or CPU and we're very write
heavy, with an average of 1K events p/s and comparatively minimal reads.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 01:35, Alex alex.monson@gmail.com wrote:

Hello,

We wish to set up an entire ELK system with the following features:

  • Input from Logstash shippers located on 400 Linux VMs. Only a
    handful of log sources on each VM.
  • Data retention for 30 days, which is roughly 2TB of data in indexed
    ES JSON form (not including replica shards)
  • Estimated input data rate of 50 messages per second at peak hours.
    Mostly short or medium length one-line messages but there will be Java
    traces and very large service responses (in the form of XML) to deal with
    too.
  • The entire system would be on our company LAN.
  • The stored data will be a mix of application logs (info, errors etc)
    and server stats (CPU, memory usage etc) and would mostly be accessed
    through Kibana.

This is our current plan:

  • Have the LS shippers perform minimal parsing (but would do
    multiline). Have them point to two load-balanced servers containing Redis
    and LS indexers (which would do all parsing).
  • 2 replica shards for each index, which ramps the total data storage
    up to 6TB
  • ES cluster spread over 6 nodes. Each node is 1TB in size
  • LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly
appreciate the advice of someone with experience!

  1. Does the balance between the number of nodes, the number of replica
    shards, and storage size of each node seem about right? We use
    high-performance equipment and would expect minimal downtime.

  2. What is your recommendation for the system design of the LS
    indexers and Redis? I've seen various designs with each indexer assigned to
    a single Redis, or all indexers reading from all Redises.

  3. Leading from the previous question, what would your recommend data
    size for the Redis servers be?

  4. Not sure what to do about master/data nodes. Assuming all the nodes
    are on identical hardware would it be beneficial to have a node which is
    only a master which would only handle requests?

  5. Do we need to do any additional load balancing on the ES nodes?

We are open to any and all suggestions. We have not yet committed to any
particular design so can change if needed.

Thank you for your time and responses,
Alex

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ajESrfRx_VFikYuxyUcanEpVTVaLdhc3ao59hbhz3AVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

  1. I haven't looked into it much yet but I'm guessing Curator can handle
    different index naming schemes. E.g. logs-2014.06.30 and
    stats-2014.06.30. We'd actually be wanting to store the stats data for 2
    years and logs for 90 days so it would indeed be helpful to split the data
    into different index sets. Do you use Curator?

  2. You say that you have "3 masters that also handle queries"... but I
    thought all masters did was handle queries? What is a master node that
    doesn't handle queries? Should we have search load balancer nodes? AKA
    not master and not data nodes.

  3. In the interests of reducing the number of node combinations for us
    to test out would you say, then, that 3 master (and query(??)) only nodes,
    and the 6 1TB data only nodes would be good?

  4. Quorum and split brain are new to me. This webpage
    http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ about
    split brain recommends setting discovery.zen.minimum_master_nodes equal
    to N/2 + 1. This formula is similar to the one given in the
    documentation for quorum
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:
    "index operations only succeed if a quorum (>replicas/2+1) of active shards
    are available". I completely understand the split brain issue, but not
    quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

1 - Looks ok, but why two replicas? You're chewing up disk for what
reason? Extra comments below.
2 - It's personal preference really and depends on how your end points
send to redis.
3 - 4GB for redis will cache quite a lot of data if you're only doing 50
events p/s (ie hours or even days based on what I've seen).
4 - No, spread it out to all the nodes. More on that below though.
5 - No it will handle that itself. Again, more on that below though.

Suggestions;
Set your indexes to (factors of) 6 shards, ie one per node, it spreads
query performance. I say "factors of" in that you can set it to 12 shards
per index to start and easily scale the node count and still spread the
load.
Split your stats and your log data into different indexes, it'll make
management and retention easier.
You can consider a master only node or (ideally) three that also handle
queries.
Preferably have an uneven number of master eligible nodes, whether you
make them VMs or physicals, that way you can ensure quorum is reached with
minimal fuss and stop split brain.
If you use VMs for master + query nodes then you might want to look at
load balancing the queries via an external service.

To give you an idea, we have a 27 node cluster - 3 masters that also
handle queries and 24 data nodes. Masters are 8GB with small disks, data
nodes are 60GB (30 heap) and 512GB disk.
We're running with one replica and have 11TB of logging data. At a high
level we're running out of disk more than heap or CPU and we're very write
heavy, with an average of 1K events p/s and comparatively minimal reads.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 31 July 2014 01:35, Alex <alex....@gmail.com <javascript:>> wrote:

Hello,

We wish to set up an entire ELK system with the following features:

  • Input from Logstash shippers located on 400 Linux VMs. Only a
    handful of log sources on each VM.
  • Data retention for 30 days, which is roughly 2TB of data in indexed
    ES JSON form (not including replica shards)
  • Estimated input data rate of 50 messages per second at peak hours.
    Mostly short or medium length one-line messages but there will be Java
    traces and very large service responses (in the form of XML) to deal with
    too.
  • The entire system would be on our company LAN.
  • The stored data will be a mix of application logs (info, errors
    etc) and server stats (CPU, memory usage etc) and would mostly be accessed
    through Kibana.

This is our current plan:

  • Have the LS shippers perform minimal parsing (but would do
    multiline). Have them point to two load-balanced servers containing Redis
    and LS indexers (which would do all parsing).
  • 2 replica shards for each index, which ramps the total data storage
    up to 6TB
  • ES cluster spread over 6 nodes. Each node is 1TB in size
  • LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly
appreciate the advice of someone with experience!

  1. Does the balance between the number of nodes, the number of
    replica shards, and storage size of each node seem about right? We use
    high-performance equipment and would expect minimal downtime.

  2. What is your recommendation for the system design of the LS
    indexers and Redis? I've seen various designs with each indexer assigned to
    a single Redis, or all indexers reading from all Redises.

  3. Leading from the previous question, what would your recommend data
    size for the Redis servers be?

  4. Not sure what to do about master/data nodes. Assuming all the
    nodes are on identical hardware would it be beneficial to have a node which
    is only a master which would only handle requests?

  5. Do we need to do any additional load balancing on the ES nodes?

We are open to any and all suggestions. We have not yet committed to any
particular design so can change if needed.

Thank you for your time and responses,
Alex

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

1 - Curator FTW.
2 - Masters handle cluster state, shard allocation and a whole bunch of
other stuff around managing the cluster and it's members and data. A node
that is master and data set to false is considered a search node. But the
role of being a master is not onerous, so it made sense for us to double up
the roles. We then just round robin any queries to these three masters.
3 - Yes, but....it's entirely dependent on your environment. If you're
happy with that and you can get the go-ahead then see where it takes you.
4 - Quorum is automatic and having the n/2+1 means that the majority of
nodes will have to take place in an election, which reduces the possibility
of split brain. If you set the discovery settings then you are also
essentially setting the quorum settings.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 22:27, Alex alex.monson@gmail.com wrote:

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

  1. I haven't looked into it much yet but I'm guessing Curator can
    handle different index naming schemes. E.g. logs-2014.06.30 and
    stats-2014.06.30. We'd actually be wanting to store the stats data for 2
    years and logs for 90 days so it would indeed be helpful to split the data
    into different index sets. Do you use Curator?

  2. You say that you have "3 masters that also handle queries"... but I
    thought all masters did was handle queries? What is a master node that
    doesn't handle queries? Should we have search load balancer nodes?
    AKA not master and not data nodes.

  3. In the interests of reducing the number of node combinations for us
    to test out would you say, then, that 3 master (and query(??)) only nodes,
    and the 6 1TB data only nodes would be good?

  4. Quorum and split brain are new to me. This webpage
    http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ about
    split brain recommends setting discovery.zen.minimum_master_nodes equal
    to N/2 + 1. This formula is similar to the one given in the
    documentation for quorum
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:
    "index operations only succeed if a quorum (>replicas/2+1) of active shards
    are available". I completely understand the split brain issue, but not
    quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

1 - Looks ok, but why two replicas? You're chewing up disk for what
reason? Extra comments below.
2 - It's personal preference really and depends on how your end points
send to redis.
3 - 4GB for redis will cache quite a lot of data if you're only doing 50
events p/s (ie hours or even days based on what I've seen).
4 - No, spread it out to all the nodes. More on that below though.
5 - No it will handle that itself. Again, more on that below though.

Suggestions;
Set your indexes to (factors of) 6 shards, ie one per node, it spreads
query performance. I say "factors of" in that you can set it to 12 shards
per index to start and easily scale the node count and still spread the
load.
Split your stats and your log data into different indexes, it'll make
management and retention easier.
You can consider a master only node or (ideally) three that also handle
queries.
Preferably have an uneven number of master eligible nodes, whether you
make them VMs or physicals, that way you can ensure quorum is reached with
minimal fuss and stop split brain.
If you use VMs for master + query nodes then you might want to look at
load balancing the queries via an external service.

To give you an idea, we have a 27 node cluster - 3 masters that also
handle queries and 24 data nodes. Masters are 8GB with small disks, data
nodes are 60GB (30 heap) and 512GB disk.
We're running with one replica and have 11TB of logging data. At a high
level we're running out of disk more than heap or CPU and we're very write
heavy, with an average of 1K events p/s and comparatively minimal reads.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 01:35, Alex alex....@gmail.com wrote:

Hello,

We wish to set up an entire ELK system with the following features:

  • Input from Logstash shippers located on 400 Linux VMs. Only a
    handful of log sources on each VM.
  • Data retention for 30 days, which is roughly 2TB of data in
    indexed ES JSON form (not including replica shards)
  • Estimated input data rate of 50 messages per second at peak hours.
    Mostly short or medium length one-line messages but there will be Java
    traces and very large service responses (in the form of XML) to deal with
    too.
  • The entire system would be on our company LAN.
  • The stored data will be a mix of application logs (info, errors
    etc) and server stats (CPU, memory usage etc) and would mostly be accessed
    through Kibana.

This is our current plan:

  • Have the LS shippers perform minimal parsing (but would do
    multiline). Have them point to two load-balanced servers containing Redis
    and LS indexers (which would do all parsing).
  • 2 replica shards for each index, which ramps the total data
    storage up to 6TB
  • ES cluster spread over 6 nodes. Each node is 1TB in size
  • LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly
appreciate the advice of someone with experience!

  1. Does the balance between the number of nodes, the number of
    replica shards, and storage size of each node seem about right? We use
    high-performance equipment and would expect minimal downtime.

  2. What is your recommendation for the system design of the LS
    indexers and Redis? I've seen various designs with each indexer assigned to
    a single Redis, or all indexers reading from all Redises.

  3. Leading from the previous question, what would your recommend
    data size for the Redis servers be?

  4. Not sure what to do about master/data nodes. Assuming all the
    nodes are on identical hardware would it be beneficial to have a node which
    is only a master which would only handle requests?

  5. Do we need to do any additional load balancing on the ES nodes?

We are open to any and all suggestions. We have not yet committed to any
particular design so can change if needed.

Thank you for your time and responses,
Alex

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YPdu%3DvSuaauhE9xu2W%3D5Z%2BidV5nBfPSM14EOAEA%2Bh40Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You can further simplify your architecture by using rsyslog with
omelasticsearch instead of LS.

This might be handy:

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

tel: +1 347 480 1610 fax: +1 718 679 9190

On Friday, August 1, 2014 12:57:26 AM UTC+2, Mark Walkom wrote:

1 - Curator FTW.
2 - Masters handle cluster state, shard allocation and a whole bunch of
other stuff around managing the cluster and it's members and data. A node
that is master and data set to false is considered a search node. But the
role of being a master is not onerous, so it made sense for us to double up
the roles. We then just round robin any queries to these three masters.
3 - Yes, but....it's entirely dependent on your environment. If you're
happy with that and you can get the go-ahead then see where it takes you.
4 - Quorum is automatic and having the n/2+1 means that the majority of
nodes will have to take place in an election, which reduces the possibility
of split brain. If you set the discovery settings then you are also
essentially setting the quorum settings.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 31 July 2014 22:27, Alex <alex....@gmail.com <javascript:>> wrote:

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

  1. I haven't looked into it much yet but I'm guessing Curator can
    handle different index naming schemes. E.g. logs-2014.06.30 and
    stats-2014.06.30. We'd actually be wanting to store the stats data for 2
    years and logs for 90 days so it would indeed be helpful to split the data
    into different index sets. Do you use Curator?

  2. You say that you have "3 masters that also handle queries"... but
    I thought all masters did was handle queries? What is a master node that
    doesn't handle queries? Should we have search load balancer nodes?
    AKA not master and not data nodes.

  3. In the interests of reducing the number of node combinations for
    us to test out would you say, then, that 3 master (and query(??)) only
    nodes, and the 6 1TB data only nodes would be good?

  4. Quorum and split brain are new to me. This webpage
    http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ about
    split brain recommends setting discovery.zen.minimum_master_nodes equal
    to N/2 + 1. This formula is similar to the one given in the
    documentation for quorum
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:
    "index operations only succeed if a quorum (>replicas/2+1) of active shards
    are available". I completely understand the split brain issue, but not
    quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

1 - Looks ok, but why two replicas? You're chewing up disk for what
reason? Extra comments below.
2 - It's personal preference really and depends on how your end points
send to redis.
3 - 4GB for redis will cache quite a lot of data if you're only doing 50
events p/s (ie hours or even days based on what I've seen).
4 - No, spread it out to all the nodes. More on that below though.
5 - No it will handle that itself. Again, more on that below though.

Suggestions;
Set your indexes to (factors of) 6 shards, ie one per node, it spreads
query performance. I say "factors of" in that you can set it to 12 shards
per index to start and easily scale the node count and still spread the
load.
Split your stats and your log data into different indexes, it'll make
management and retention easier.
You can consider a master only node or (ideally) three that also handle
queries.
Preferably have an uneven number of master eligible nodes, whether you
make them VMs or physicals, that way you can ensure quorum is reached with
minimal fuss and stop split brain.
If you use VMs for master + query nodes then you might want to look at
load balancing the queries via an external service.

To give you an idea, we have a 27 node cluster - 3 masters that also
handle queries and 24 data nodes. Masters are 8GB with small disks, data
nodes are 60GB (30 heap) and 512GB disk.
We're running with one replica and have 11TB of logging data. At a high
level we're running out of disk more than heap or CPU and we're very write
heavy, with an average of 1K events p/s and comparatively minimal reads.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 01:35, Alex alex....@gmail.com wrote:

Hello,

We wish to set up an entire ELK system with the following features:

  • Input from Logstash shippers located on 400 Linux VMs. Only a
    handful of log sources on each VM.
  • Data retention for 30 days, which is roughly 2TB of data in
    indexed ES JSON form (not including replica shards)
  • Estimated input data rate of 50 messages per second at peak
    hours. Mostly short or medium length one-line messages but there will be
    Java traces and very large service responses (in the form of XML) to deal
    with too.
  • The entire system would be on our company LAN.
  • The stored data will be a mix of application logs (info, errors
    etc) and server stats (CPU, memory usage etc) and would mostly be accessed
    through Kibana.

This is our current plan:

  • Have the LS shippers perform minimal parsing (but would do
    multiline). Have them point to two load-balanced servers containing Redis
    and LS indexers (which would do all parsing).
  • 2 replica shards for each index, which ramps the total data
    storage up to 6TB
  • ES cluster spread over 6 nodes. Each node is 1TB in size
  • LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly
appreciate the advice of someone with experience!

  1. Does the balance between the number of nodes, the number of
    replica shards, and storage size of each node seem about right? We use
    high-performance equipment and would expect minimal downtime.

  2. What is your recommendation for the system design of the LS
    indexers and Redis? I've seen various designs with each indexer assigned to
    a single Redis, or all indexers reading from all Redises.

  3. Leading from the previous question, what would your recommend
    data size for the Redis servers be?

  4. Not sure what to do about master/data nodes. Assuming all the
    nodes are on identical hardware would it be beneficial to have a node which
    is only a master which would only handle requests?

  5. Do we need to do any additional load balancing on the ES nodes?

We are open to any and all suggestions. We have not yet committed to
any particular design so can change if needed.

Thank you for your time and responses,
Alex

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46039dc1-0533-4682-8080-22918230bb3b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ok thank you Mark, you've been extremely helpful and we now have a better
idea about what we're doing!

-Alex

On Thursday, 31 July 2014 23:57:26 UTC+1, Mark Walkom wrote:

1 - Curator FTW.
2 - Masters handle cluster state, shard allocation and a whole bunch of
other stuff around managing the cluster and it's members and data. A node
that is master and data set to false is considered a search node. But the
role of being a master is not onerous, so it made sense for us to double up
the roles. We then just round robin any queries to these three masters.
3 - Yes, but....it's entirely dependent on your environment. If you're
happy with that and you can get the go-ahead then see where it takes you.
4 - Quorum is automatic and having the n/2+1 means that the majority of
nodes will have to take place in an election, which reduces the possibility
of split brain. If you set the discovery settings then you are also
essentially setting the quorum settings.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 31 July 2014 22:27, Alex <alex....@gmail.com <javascript:>> wrote:

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

  1. I haven't looked into it much yet but I'm guessing Curator can
    handle different index naming schemes. E.g. logs-2014.06.30 and
    stats-2014.06.30. We'd actually be wanting to store the stats data for 2
    years and logs for 90 days so it would indeed be helpful to split the data
    into different index sets. Do you use Curator?

  2. You say that you have "3 masters that also handle queries"... but
    I thought all masters did was handle queries? What is a master node that
    doesn't handle queries? Should we have search load balancer nodes?
    AKA not master and not data nodes.

  3. In the interests of reducing the number of node combinations for
    us to test out would you say, then, that 3 master (and query(??)) only
    nodes, and the 6 1TB data only nodes would be good?

  4. Quorum and split brain are new to me. This webpage
    http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ about
    split brain recommends setting discovery.zen.minimum_master_nodes equal
    to N/2 + 1. This formula is similar to the one given in the
    documentation for quorum
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:
    "index operations only succeed if a quorum (>replicas/2+1) of active shards
    are available". I completely understand the split brain issue, but not
    quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

1 - Looks ok, but why two replicas? You're chewing up disk for what
reason? Extra comments below.
2 - It's personal preference really and depends on how your end points
send to redis.
3 - 4GB for redis will cache quite a lot of data if you're only doing 50
events p/s (ie hours or even days based on what I've seen).
4 - No, spread it out to all the nodes. More on that below though.
5 - No it will handle that itself. Again, more on that below though.

Suggestions;
Set your indexes to (factors of) 6 shards, ie one per node, it spreads
query performance. I say "factors of" in that you can set it to 12 shards
per index to start and easily scale the node count and still spread the
load.
Split your stats and your log data into different indexes, it'll make
management and retention easier.
You can consider a master only node or (ideally) three that also handle
queries.
Preferably have an uneven number of master eligible nodes, whether you
make them VMs or physicals, that way you can ensure quorum is reached with
minimal fuss and stop split brain.
If you use VMs for master + query nodes then you might want to look at
load balancing the queries via an external service.

To give you an idea, we have a 27 node cluster - 3 masters that also
handle queries and 24 data nodes. Masters are 8GB with small disks, data
nodes are 60GB (30 heap) and 512GB disk.
We're running with one replica and have 11TB of logging data. At a high
level we're running out of disk more than heap or CPU and we're very write
heavy, with an average of 1K events p/s and comparatively minimal reads.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 01:35, Alex alex....@gmail.com wrote:

Hello,

We wish to set up an entire ELK system with the following features:

  • Input from Logstash shippers located on 400 Linux VMs. Only a
    handful of log sources on each VM.
  • Data retention for 30 days, which is roughly 2TB of data in
    indexed ES JSON form (not including replica shards)
  • Estimated input data rate of 50 messages per second at peak
    hours. Mostly short or medium length one-line messages but there will be
    Java traces and very large service responses (in the form of XML) to deal
    with too.
  • The entire system would be on our company LAN.
  • The stored data will be a mix of application logs (info, errors
    etc) and server stats (CPU, memory usage etc) and would mostly be accessed
    through Kibana.

This is our current plan:

  • Have the LS shippers perform minimal parsing (but would do
    multiline). Have them point to two load-balanced servers containing Redis
    and LS indexers (which would do all parsing).
  • 2 replica shards for each index, which ramps the total data
    storage up to 6TB
  • ES cluster spread over 6 nodes. Each node is 1TB in size
  • LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly
appreciate the advice of someone with experience!

  1. Does the balance between the number of nodes, the number of
    replica shards, and storage size of each node seem about right? We use
    high-performance equipment and would expect minimal downtime.

  2. What is your recommendation for the system design of the LS
    indexers and Redis? I've seen various designs with each indexer assigned to
    a single Redis, or all indexers reading from all Redises.

  3. Leading from the previous question, what would your recommend
    data size for the Redis servers be?

  4. Not sure what to do about master/data nodes. Assuming all the
    nodes are on identical hardware would it be beneficial to have a node which
    is only a master which would only handle requests?

  5. Do we need to do any additional load balancing on the ES nodes?

We are open to any and all suggestions. We have not yet committed to
any particular design so can change if needed.

Thank you for your time and responses,
Alex

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dc9ee1d5-c5c9-436b-a306-758e379fde92%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark,

I've done more investigating and it seems that a Client (AKA Query) node
cannot also be a Master node. As it says here

Nodes can be excluded from becoming a master by setting node.master to
false. Note, once a node is a client node (node.client set to true), it
will not be allowed to become a master (node.master is automatically set to
false).

And from the elasticsearch.yml config file it says:

# 2. You want this node to only serve as a master: to not store any data
and # to have free resources. This will be the "coordinator" of your
cluster. # #node.master: true #node.data: false # # 3. You want this node
to be neither master nor data node, but # to act as a "search load
balancer" (fetching data from nodes, # aggregating results, etc.) #
#node.master: false #node.data: false

So I'm wondering how exactly you set up your client nodes to also be master
nodes. It seems like a master node can only either be purely a master or
master + data.

Regards, Alex

On Thursday, 31 July 2014 23:57:26 UTC+1, Mark Walkom wrote:

1 - Curator FTW.
2 - Masters handle cluster state, shard allocation and a whole bunch of
other stuff around managing the cluster and it's members and data. A node
that is master and data set to false is considered a search node. But the
role of being a master is not onerous, so it made sense for us to double up
the roles. We then just round robin any queries to these three masters.
3 - Yes, but....it's entirely dependent on your environment. If you're
happy with that and you can get the go-ahead then see where it takes you.
4 - Quorum is automatic and having the n/2+1 means that the majority of
nodes will have to take place in an election, which reduces the possibility
of split brain. If you set the discovery settings then you are also
essentially setting the quorum settings.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 31 July 2014 22:27, Alex <alex....@gmail.com <javascript:>> wrote:

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

  1. I haven't looked into it much yet but I'm guessing Curator can
    handle different index naming schemes. E.g. logs-2014.06.30 and
    stats-2014.06.30. We'd actually be wanting to store the stats data for 2
    years and logs for 90 days so it would indeed be helpful to split the data
    into different index sets. Do you use Curator?

  2. You say that you have "3 masters that also handle queries"... but
    I thought all masters did was handle queries? What is a master node that
    doesn't handle queries? Should we have search load balancer nodes?
    AKA not master and not data nodes.

  3. In the interests of reducing the number of node combinations for
    us to test out would you say, then, that 3 master (and query(??)) only
    nodes, and the 6 1TB data only nodes would be good?

  4. Quorum and split brain are new to me. This webpage
    http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ about
    split brain recommends setting discovery.zen.minimum_master_nodes equal
    to N/2 + 1. This formula is similar to the one given in the
    documentation for quorum
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:
    "index operations only succeed if a quorum (>replicas/2+1) of active shards
    are available". I completely understand the split brain issue, but not
    quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

1 - Looks ok, but why two replicas? You're chewing up disk for what
reason? Extra comments below.
2 - It's personal preference really and depends on how your end points
send to redis.
3 - 4GB for redis will cache quite a lot of data if you're only doing 50
events p/s (ie hours or even days based on what I've seen).
4 - No, spread it out to all the nodes. More on that below though.
5 - No it will handle that itself. Again, more on that below though.

Suggestions;
Set your indexes to (factors of) 6 shards, ie one per node, it spreads
query performance. I say "factors of" in that you can set it to 12 shards
per index to start and easily scale the node count and still spread the
load.
Split your stats and your log data into different indexes, it'll make
management and retention easier.
You can consider a master only node or (ideally) three that also handle
queries.
Preferably have an uneven number of master eligible nodes, whether you
make them VMs or physicals, that way you can ensure quorum is reached with
minimal fuss and stop split brain.
If you use VMs for master + query nodes then you might want to look at
load balancing the queries via an external service.

To give you an idea, we have a 27 node cluster - 3 masters that also
handle queries and 24 data nodes. Masters are 8GB with small disks, data
nodes are 60GB (30 heap) and 512GB disk.
We're running with one replica and have 11TB of logging data. At a high
level we're running out of disk more than heap or CPU and we're very write
heavy, with an average of 1K events p/s and comparatively minimal reads.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 01:35, Alex alex....@gmail.com wrote:

Hello,

We wish to set up an entire ELK system with the following features:

  • Input from Logstash shippers located on 400 Linux VMs. Only a
    handful of log sources on each VM.
  • Data retention for 30 days, which is roughly 2TB of data in
    indexed ES JSON form (not including replica shards)
  • Estimated input data rate of 50 messages per second at peak
    hours. Mostly short or medium length one-line messages but there will be
    Java traces and very large service responses (in the form of XML) to deal
    with too.
  • The entire system would be on our company LAN.
  • The stored data will be a mix of application logs (info, errors
    etc) and server stats (CPU, memory usage etc) and would mostly be accessed
    through Kibana.

This is our current plan:

  • Have the LS shippers perform minimal parsing (but would do
    multiline). Have them point to two load-balanced servers containing Redis
    and LS indexers (which would do all parsing).
  • 2 replica shards for each index, which ramps the total data
    storage up to 6TB
  • ES cluster spread over 6 nodes. Each node is 1TB in size
  • LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly
appreciate the advice of someone with experience!

  1. Does the balance between the number of nodes, the number of
    replica shards, and storage size of each node seem about right? We use
    high-performance equipment and would expect minimal downtime.

  2. What is your recommendation for the system design of the LS
    indexers and Redis? I've seen various designs with each indexer assigned to
    a single Redis, or all indexers reading from all Redises.

  3. Leading from the previous question, what would your recommend
    data size for the Redis servers be?

  4. Not sure what to do about master/data nodes. Assuming all the
    nodes are on identical hardware would it be beneficial to have a node which
    is only a master which would only handle requests?

  5. Do we need to do any additional load balancing on the ES nodes?

We are open to any and all suggestions. We have not yet committed to
any particular design so can change if needed.

Thank you for your time and responses,
Alex

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b39725b8-6e59-4b0c-8ecd-edcddea2532a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.