Thick memory single node vs thin memory multi node

avibh · January 6, 2017, 2:07pm

Hi,

I have a server with 256G RAM and I use a single node with 30G RAM. my app is indexing heavily, and also is queried a lot by users.
question is, what is the preferred deployment for this scenario, having a single node with lots of RAM or a cluster of smaller nodes (say 3 node 10G each)?
note that I dont care about resiliency (replication), but I do care about capacity and query time.

thanks

eperry · January 7, 2017, 5:38pm

Well replication will help with query time but that topic aside. So lots of spare memory in LINUX to use as filecache and Buffer is Really good an you have the hardware to do it

Query times will be affect by your ability to Cache them, and keep Indexing data in memory

SO: trying to keep it simple. with what I know as an example:

I Index about 700M events every day (~ 1TB of logs, or ~10K messages a second)

For indexing I don't find needing a lot of heap a big issue, I run at 30GB ( Don't go over 32GB as there is performance impact till you get to around I think 70GB )

My biggest issue is at the EXT4 Filesystem, even though My disk array can write faster I found the EXT4 journal system bottlenecked at about 300Mb(JBD2 is not threaded and can only use one core) , So I had to spit to multiple mount points and WAM my searches and a indexing went off the charts. XFS probably has some issues where you probably want multiple Mounts but I hear it is more efficient

Now there are 2 thinkgs I would look at
Your thread pools
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html

and heap sizes something to read
https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
https://www.elastic.co/guide/en/elasticsearch/reference/master/indexing-buffer.html

I would think you can also read up on this page, even though it is for 2.3 it still mostly applies maybe they updated for 5.1
https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html

But before you go Tweaking any of the internal parameters, for the most part they are Pretty dam good and autoscale so you will probably not see huge changes by adjusting them.

The biggest thing I see over and over again is to watch your

CPU Utilization
Load average (Lower the better and if you can't find the bottleneck search some more)
Disk IO HUGE
Size of index to number of shards 1 TB index with 1 shard and 1 CPU very very slow. 1TB index with 100 shards and 100 CPU's FAST (With lots of mount points)

Hope this helps

avibh · January 13, 2017, 1:36pm

Thanks for the reply eperry.
my indexing speed is a lot slower, i index 20M a day.
i guess it's cause of the events are thicker in data than your normal logs.
but you haven't answered my original question - which is best here, cluster of smaller nodes and a single bug node?

Christian_Dahlqvist · January 13, 2017, 1:40pm

If you are deploying on a single server, I would expect a single big node to be better, at least as long as you do not suffer from heap pressure. This eliminates the need to communicate between the nodes and should therefore be more efficient.

avibh · January 14, 2017, 12:49pm

interesting you raise the heap issue. lately i've noticed some long "old" GC on node logs.
in general with the default GC, the bigger the heap size, the longer the GC.
is this something that can be avoided using smaller nodes?

Christian_Dahlqvist · January 14, 2017, 12:55pm

It is hard to tell whether changes in GC patterns will outweigh the added communication between nodes . I would recommend benchmarking it.

system · February 11, 2017, 12:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Optimal Number of Shards Elasticsearch	2	451	July 6, 2017
Calculate optimal number of nodes Elasticsearch	3	400	July 6, 2017
Elasticsearch Memory issue Elasticsearch	10	1711	July 6, 2017
Large Capacity Sizing: Single Cluster vs Multiple Clusters, Index Sizing, Memory Problems Elasticsearch	3	986	January 10, 2019
Planning heap size for ES nodes Elasticsearch	3	435	July 6, 2017

Thick memory single node vs thin memory multi node

Related topics