Thick memory single node vs thin memory multi node

Hi,

I have a server with 256G RAM and I use a single node with 30G RAM. my app is indexing heavily, and also is queried a lot by users.
question is, what is the preferred deployment for this scenario, having a single node with lots of RAM or a cluster of smaller nodes (say 3 node 10G each)?
note that I dont care about resiliency (replication), but I do care about capacity and query time.

thanks

Well replication will help with query time but that topic aside. So lots of spare memory in LINUX to use as filecache and Buffer is Really good an you have the hardware to do it

Query times will be affect by your ability to Cache them, and keep Indexing data in memory

SO: trying to keep it simple. with what I know as an example:

I Index about 700M events every day (~ 1TB of logs, or ~10K messages a second)

For indexing I don't find needing a lot of heap a big issue, I run at 30GB ( Don't go over 32GB as there is performance impact till you get to around I think 70GB )

My biggest issue is at the EXT4 Filesystem, even though My disk array can write faster I found the EXT4 journal system bottlenecked at about 300Mb(JBD2 is not threaded and can only use one core) , So I had to spit to multiple mount points and WAM my searches and a indexing went off the charts. XFS probably has some issues where you probably want multiple Mounts but I hear it is more efficient

Now there are 2 thinkgs I would look at
Your thread pools
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html

and heap sizes something to read
https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
https://www.elastic.co/guide/en/elasticsearch/reference/master/indexing-buffer.html

I would think you can also read up on this page, even though it is for 2.3 it still mostly applies maybe they updated for 5.1
https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html

But before you go Tweaking any of the internal parameters, for the most part they are Pretty dam good and autoscale so you will probably not see huge changes by adjusting them.

The biggest thing I see over and over again is to watch your

CPU Utilization
Load average (Lower the better and if you can't find the bottleneck search some more)
Disk IO HUGE
Size of index to number of shards 1 TB index with 1 shard and 1 CPU very very slow. 1TB index with 100 shards and 100 CPU's FAST (With lots of mount points)

Hope this helps

Thanks for the reply eperry.
my indexing speed is a lot slower, i index 20M a day.
i guess it's cause of the events are thicker in data than your normal logs.
but you haven't answered my original question - which is best here, cluster of smaller nodes and a single bug node?

If you are deploying on a single server, I would expect a single big node to be better, at least as long as you do not suffer from heap pressure. This eliminates the need to communicate between the nodes and should therefore be more efficient.

interesting you raise the heap issue. lately i've noticed some long "old" GC on node logs.
in general with the default GC, the bigger the heap size, the longer the GC.
is this something that can be avoided using smaller nodes?

It is hard to tell whether changes in GC patterns will outweigh the added communication between nodes . I would recommend benchmarking it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.