Elasticsearch2.4 deployment question

Hi,
I have one server for production use. The server gets CPU(16 cores) and memory(64GB). The version of Elasticsearch is 2.4.4. I build client application base on spring boot 1.5. I have 3 indices and 2 million records. Could someone give me some advises that how to make the deployment on this environment? Such as single node cluster, or two nodes on the same server, or two servers is indeed needed. Thanks.

PS:
I did some load test before. I tried three cases. Case 1: one node only. Case 2: 3 nodes cluster. Case 3: 5 nodes. I make 10k requests per second to the spring boot client then the client call the Cluster. I found the server resources was really weird. The CPU usage is more than 80% and memory usage is low. Is that sensible?
Here is the server resources statistic of 5 nodes cluster. The one with PID(19291) is the spring boot client.

Cpu(s): 84.0%us, 11.8%sy, 0.0%ni, 2.8%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st
Mem: 65973216k total, 34067092k used, 31906124k free, 355848k buffers
Swap: 2097148k total, 0k used, 2097148k free, 8421992k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2403 xxxx 20 0 12.8g 1.4g 17m S 329.7 2.2 58:42.34 java
2522 xxxx 20 0 12.9g 1.4g 17m S 289.3 2.2 43:33.81 java
2467 xxxx 20 0 12.8g 1.4g 17m S 257.2 2.2 42:01.08 java
2434 xxxx 20 0 12.9g 1.4g 17m S 254.9 2.2 44:24.92 java
2371 xxxx 20 0 12.9g 1.4g 17m S 253.6 2.2 41:37.90 java
19291 xxxx 20 0 60.4g 15g 14m S 154.3 25.1 303:28.67 java

Hello,

I would mount 2 node with 16Go of locked heap.

heap sizing guide

If you don't need high availability, you may consider having indices without replicas otherwise 1 replica.
2 or 3 primary shards may be a good start.

Elastichsearch will spread request among all the nodes, so if you have 5 node on one machine, it is not crazy to have 80% cpu load when you send 10k requests per seconds..
Moreover, if you are indexing while sending requests, CPU will also take a shoot.

Hello Tetrapack,
Thanks for your reply. I do really appreciate. And I will try the 2 nodes-cluster later.
As you say the 80% CPU load is reasonable. I agree with you. There is still one thing confused me. Notice the statistic of the 5 nodes. I think the memory usage is a little low while CPU usage is high. For example, PID(2403), the CPU is 329.7 and the memory is 2.2. I guess the memory was not full used. Or just because the data is a little. As you know there is 2 million documents. How do you think about this? Thank you.

yeah, 2 millions doc over 5 nodes is not so much. Then, heap usage will be low on each node. If you try 2 nodes, probably heap usage on those node will increase, but CPU usage, will be lower

The heap usage will depends on the depth of your documents and how do you request them.

=> Aggregations and sorting need a lot of memory

Got it. Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.