Elasticsearch hardware requirement,and benchmarking


(haries fajar nugroho) #1

I have created elasticsearch cluster (5 shards, 1 replica, 65536 of
max_file_descriptors, enable mlockall,same value between es_min_mem and
es_max_mem, 1 node as master node also data node, 1 node as data only node,
1.5-3 GB of logs/hour need to be processed). But i have some issues
crossing in my head :

  1. Hardware limitation (not all of my available hardwares are five star
    hardware). Five star hardware means, SSD, big RAM, Gigabit interface

So do you have any information, which is the most suitable hardware for
each of these nodes:

  • master node, does this node needs SSD, big RAM, Gigabit interface ?
  • data node, does this node needs SSD, big RAM, Gigabit interface ?
  • client node, does this node needs SSD, big RAM, Gigabit interface ?
  • for kibana, should it be connected to client node/data node/or master
    node ? Until now, i only connecting my kibana to master node.

in my understanding, data node should have SSD, big RAM, master node needs
gigabit interface, medium amount of RAM, doesn't really need SSD, and for
client node needs to have gigabit interface, no need big RAM, no need SSD.
CMIIW

  1. Testing the configuration of my implementation. Do you have any
    recommended scenario so i could know about my elasticsearch cluster
    performance. How many query request that could be handle, how many index
    that could be processed, how fast the query can be performed. Until now i'm
    still using Bigdesk as cluster monitoring tools

Regards,

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/547f879b-290c-4994-9a1a-28dd6de4fad5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

‚ÄčNone of the nodes needs super high performance hardware, but it will
definitely benefit from it. Given the ubiquity of gigabit ethernet, you
shouldn't be going for any less. 10GBe is nice if you can afford it.

Master only nodes don't need to be large, they can be simply a few CPU
cores, a few GB of RAM and minimal disk - they don't hold any data.
Same goes for client only nodes, though you probably want more RAM on them
to allow for querying.
Data only nodes are the ones you want to invest your budget in, they want
fast disk for retrieval, lots of CPU for indexing and lots of RAM for both.
Kibana can go on a client node. We don't recommend sending queries or
indexing via master only nodes.

As for testing, you really need to do that on your own use case, what you
expect your data and your queries to look like. This is really domain
knowledge.

PS - We're moving to https://discuss.elastic.co/, please join us there for
any future discussions!

On 10 May 2015 at 18:11, haries fajar nugroho hariesfn@gmail.com wrote:

I have created elasticsearch cluster (5 shards, 1 replica, 65536 of
max_file_descriptors, enable mlockall,same value between es_min_mem and
es_max_mem, 1 node as master node also data node, 1 node as data only node,
1.5-3 GB of logs/hour need to be processed). But i have some issues
crossing in my head :

  1. Hardware limitation (not all of my available hardwares are five star
    hardware). Five star hardware means, SSD, big RAM, Gigabit interface

So do you have any information, which is the most suitable hardware for
each of these nodes:

  • master node, does this node needs SSD, big RAM, Gigabit interface ?
  • data node, does this node needs SSD, big RAM, Gigabit interface ?
  • client node, does this node needs SSD, big RAM, Gigabit interface ?
  • for kibana, should it be connected to client node/data node/or master
    node ? Until now, i only connecting my kibana to master node.

in my understanding, data node should have SSD, big RAM, master node needs
gigabit interface, medium amount of RAM, doesn't really need SSD, and for
client node needs to have gigabit interface, no need big RAM, no need SSD.
CMIIW

  1. Testing the configuration of my implementation. Do you have any
    recommended scenario so i could know about my elasticsearch cluster
    performance. How many query request that could be handle, how many index
    that could be processed, how fast the query can be performed. Until now i'm
    still using Bigdesk as cluster monitoring tools

Regards,

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/547f879b-290c-4994-9a1a-28dd6de4fad5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/547f879b-290c-4994-9a1a-28dd6de4fad5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9-Phc6%3DdhuqFSrqKGb4LCs_aYpgFXYxhnP9LGSRD61DA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(haries fajar nugroho) #3

Hi Mark,

Thanks for your sharing. It's best if kibana talks to client node. But which is better, kibana talks to master mode or kibana talks directly to data nodes. And currently, my query needs 5-6 secs and i want to improve it to 1-2 secs. Which is better adding more nodes or adding more ram to existing node ?

Regards,

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5a98073f-9033-4d5c-87ff-20b8e513abdb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(haries fajar nugroho) #4

Hi Mark,

Thanks for your sharing. It's best if kibana talks to client node. But which is better, kibana talks to master mode or kibana talks directly to data nodes. And currently, my query needs 5-6 secs and i want to improve it to 1-2 secs. Which is better adding more nodes or adding more ram to existing node ?

Regards,

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d4d6e13e-af50-46e8-bda4-b1931cbe68bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(haries fajar nugroho) #5

Hi Mark,

Thanks for your sharing. It's best if kibana talks to client node. But which is better, kibana talks to master mode or kibana talks directly to data nodes. And currently, my query needs 5-6 secs and i want to improve it to 1-2 secs. Which is better adding more nodes or adding more ram to existing node ?

Regards,

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/033706f5-4291-43a9-b6f9-e9f523faedf9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #6

As I said, we don't recommend sending queries, which includes those
generated by Kibana, to master only nodes. You would be better off sending
them to data nodes.

As for your performance problems, that's a multi-layered problem that may
not be solved just by adding more nodes. You need to provide more
information around your cluster setup.

On 10 May 2015 at 22:16, haries fajar nugroho hariesfn@gmail.com wrote:

Hi Mark,

Thanks for your sharing. It's best if kibana talks to client node. But
which is better, kibana talks to master mode or kibana talks directly to
data nodes. And currently, my query needs 5-6 secs and i want to improve it
to 1-2 secs. Which is better adding more nodes or adding more ram to
existing node ?

Regards,

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/033706f5-4291-43a9-b6f9-e9f523faedf9%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_ratb2DCVUPt9t42F5s9aQikyNEP_E5ER0ACDn3bODrA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(haries fajar nugroho) #7

Hi Mark,

Thanks, really appreciate your response. Previously i thought that it is
better kibana to communicate with master rather than data node, but i was
wrong.

So this is my setup, hope it answers your question:

Currently, my cluster contain 2 nodes (node a and node b). Node a as master
node, data note, and also logstash server and node b as data node only.
Each node has the same disk space, same ram amount, same network interface
on the same subnet, allocated 2gb as heap size for each nodes, using
unicast to talk between node, 5 shard and 1 replica, 65536 of
max_file_descriptors, enable mlockall, and 1.5-3GB of logs/hour that needs
to be processed, both nodes using elasticsearch version 1.5.2, the logs
sent by logstash forwarder from 2 servers.

Regards,

On Monday, May 11, 2015 at 4:43:17 AM UTC+7, Mark Walkom wrote:

As I said, we don't recommend sending queries, which includes those
generated by Kibana, to master only nodes. You would be better off sending
them to data nodes.

As for your performance problems, that's a multi-layered problem that may
not be solved just by adding more nodes. You need to provide more
information around your cluster setup.

On 10 May 2015 at 22:16, haries fajar nugroho <hari...@gmail.com
<javascript:>> wrote:

Hi Mark,

Thanks for your sharing. It's best if kibana talks to client node. But
which is better, kibana talks to master mode or kibana talks directly to
data nodes. And currently, my query needs 5-6 secs and i want to improve it
to 1-2 secs. Which is better adding more nodes or adding more ram to
existing node ?

Regards,

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/033706f5-4291-43a9-b6f9-e9f523faedf9%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9d85dfa7-4e97-4788-bc5a-ff3bd65c814e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #8

What about data volumes, index and search rates?

On 11 May 2015 at 11:40, haries fajar nugroho hariesfn@gmail.com wrote:

Hi Mark,

Thanks, really appreciate your response. Previously i thought that it is
better kibana to communicate with master rather than data node, but i was
wrong.

So this is my setup, hope it answers your question:

Currently, my cluster contain 2 nodes (node a and node b). Node a as
master node, data note, and also logstash server and node b as data node
only. Each node has the same disk space, same ram amount, same network
interface on the same subnet, allocated 2gb as heap size for each nodes,
using unicast to talk between node, 5 shard and 1 replica, 65536 of
max_file_descriptors, enable mlockall, and 1.5-3GB of logs/hour that needs
to be processed, both nodes using elasticsearch version 1.5.2, the logs
sent by logstash forwarder from 2 servers.

Regards,

On Monday, May 11, 2015 at 4:43:17 AM UTC+7, Mark Walkom wrote:

As I said, we don't recommend sending queries, which includes those
generated by Kibana, to master only nodes. You would be better off sending
them to data nodes.

As for your performance problems, that's a multi-layered problem that may
not be solved just by adding more nodes. You need to provide more
information around your cluster setup.

On 10 May 2015 at 22:16, haries fajar nugroho hari...@gmail.com wrote:

Hi Mark,

Thanks for your sharing. It's best if kibana talks to client node. But
which is better, kibana talks to master mode or kibana talks directly to
data nodes. And currently, my query needs 5-6 secs and i want to improve it
to 1-2 secs. Which is better adding more nodes or adding more ram to
existing node ?

Regards,

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/033706f5-4291-43a9-b6f9-e9f523faedf9%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/


You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9d85dfa7-4e97-4788-bc5a-ff3bd65c814e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9d85dfa7-4e97-4788-bc5a-ff3bd65c814e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_QC2GTfXenJeRCg8Ey4VyjpRH0%2BOPj%2BCLWAC98fU_5AQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(haries fajar nugroho) #9

Hi Mark,

Does data volume that you mean is my current log files that need to be
processed ? If yes, it is 1.5-3 gb/hour. If index rate that you mean is
Indexing request per second in bigdesk, it is 74839326 /seconds. If search
rate means search request per second in bigdesk, it is 500-700
query/second. Now i have create a better implementation of a cluster.
Master node index to 2 data nodes, and kibana talks to client node. But the
query now takes about 18-21 seconds, query that i mean is by changing the
time filter in kibana (from 4 hours to 1 hour or by changing absolute time
filter) until the data/visualization finished to load.

Regards,

On Tuesday, May 12, 2015 at 6:36:51 AM UTC+7, Mark Walkom wrote:

What about data volumes, index and search rates?

On 11 May 2015 at 11:40, haries fajar nugroho <hari...@gmail.com
<javascript:>> wrote:

Hi Mark,

Thanks, really appreciate your response. Previously i thought that it is
better kibana to communicate with master rather than data node, but i was
wrong.

So this is my setup, hope it answers your question:

Currently, my cluster contain 2 nodes (node a and node b). Node a as
master node, data note, and also logstash server and node b as data node
only. Each node has the same disk space, same ram amount, same network
interface on the same subnet, allocated 2gb as heap size for each nodes,
using unicast to talk between node, 5 shard and 1 replica, 65536 of
max_file_descriptors, enable mlockall, and 1.5-3GB of logs/hour that needs
to be processed, both nodes using elasticsearch version 1.5.2, the logs
sent by logstash forwarder from 2 servers.

Regards,

On Monday, May 11, 2015 at 4:43:17 AM UTC+7, Mark Walkom wrote:

As I said, we don't recommend sending queries, which includes those
generated by Kibana, to master only nodes. You would be better off sending
them to data nodes.

As for your performance problems, that's a multi-layered problem that
may not be solved just by adding more nodes. You need to provide more
information around your cluster setup.

On 10 May 2015 at 22:16, haries fajar nugroho hari...@gmail.com wrote:

Hi Mark,

Thanks for your sharing. It's best if kibana talks to client node. But
which is better, kibana talks to master mode or kibana talks directly to
data nodes. And currently, my query needs 5-6 secs and i want to improve it
to 1-2 secs. Which is better adding more nodes or adding more ram to
existing node ?

Regards,

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/033706f5-4291-43a9-b6f9-e9f523faedf9%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to
https://discuss.elastic.co/


You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9d85dfa7-4e97-4788-bc5a-ff3bd65c814e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9d85dfa7-4e97-4788-bc5a-ff3bd65c814e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8214f3e9-43bb-42d4-b30b-c27bc381c10c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(haries fajar nugroho) #10

Hi Mark,

Here i adding more information from my previous email. I have changed the
configuration, i have changed the heap size to 6 GB for each data node. So
my cluster contain 2 data node: es-system-datanode_a and
es-syste-datanode_b. The cluster still yellow, since between data nodes is
still synchronizing between each other after adding more memory to the
datanode.

Regards,
Haries

On Tuesday, May 12, 2015 at 8:02:01 AM UTC+7, haries fajar nugroho wrote:

Hi Mark,

Does data volume that you mean is my current log files that need to be
processed ? If yes, it is 1.5-3 gb/hour. If index rate that you mean is
Indexing request per second in bigdesk, it is 74839326 /seconds. If
search rate means search request per second in bigdesk, it is 500-700
query/second. Now i have create a better implementation of a cluster.
Master node index to 2 data nodes, and kibana talks to client node. But the
query now takes about 18-21 seconds, query that i mean is by changing the
time filter in kibana (from 4 hours to 1 hour or by changing absolute time
filter) until the data/visualization finished to load.

Regards,

On Tuesday, May 12, 2015 at 6:36:51 AM UTC+7, Mark Walkom wrote:

What about data volumes, index and search rates?

On 11 May 2015 at 11:40, haries fajar nugroho hari...@gmail.com wrote:

Hi Mark,

Thanks, really appreciate your response. Previously i thought that it is
better kibana to communicate with master rather than data node, but i was
wrong.

So this is my setup, hope it answers your question:

Currently, my cluster contain 2 nodes (node a and node b). Node a as
master node, data note, and also logstash server and node b as data node
only. Each node has the same disk space, same ram amount, same network
interface on the same subnet, allocated 2gb as heap size for each nodes,
using unicast to talk between node, 5 shard and 1 replica, 65536 of
max_file_descriptors, enable mlockall, and 1.5-3GB of logs/hour that needs
to be processed, both nodes using elasticsearch version 1.5.2, the logs
sent by logstash forwarder from 2 servers.

Regards,

On Monday, May 11, 2015 at 4:43:17 AM UTC+7, Mark Walkom wrote:

As I said, we don't recommend sending queries, which includes those
generated by Kibana, to master only nodes. You would be better off sending
them to data nodes.

As for your performance problems, that's a multi-layered problem that
may not be solved just by adding more nodes. You need to provide more
information around your cluster setup.

On 10 May 2015 at 22:16, haries fajar nugroho hari...@gmail.com
wrote:

Hi Mark,

Thanks for your sharing. It's best if kibana talks to client node. But
which is better, kibana talks to master mode or kibana talks directly to
data nodes. And currently, my query needs 5-6 secs and i want to improve it
to 1-2 secs. Which is better adding more nodes or adding more ram to
existing node ?

Regards,

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/033706f5-4291-43a9-b6f9-e9f523faedf9%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to
https://discuss.elastic.co/


You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9d85dfa7-4e97-4788-bc5a-ff3bd65c814e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9d85dfa7-4e97-4788-bc5a-ff3bd65c814e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ef3957bb-4729-4b08-9767-8946571bf0c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #11