I have 9 nodes ES cluster, Q9400 CPU, 8G RAM, SATA2 discs, 4G heap
size.
ES runs in embedded mode, with small web logic wrapping it.
All 9 nodes are load balanced.
GC tuned properly, no long tenured collections at all.
We have relatively simple document mapping with 5 fields, and one
object with dynamic fields, 40 million entries total to be indexed.
We get pretty slow index rate of 100 index ops/second for each node,
with response time ~50-200 msec (sync mode)
Once a minute (from 30 seconds up to 2 minutes) we experiencing
strange behaviour of "global" freeze during 10-20 seconds, and insert
times increased up to 10 seconds each (our time out value is 10
seconds). The freeze is not absolute, there are some requests
processed during the freeze, but overall system performance is
dropped.
Our current monitoring method is htop, bmon, vmstat, iostat, lsof.
During normal behaviour period, our CPU is about 50%, IO utilization
about 10%, about 4G of ram for file caching, about 800KB/sec RX/TX
network.
During the freeze period, CPU almost 0% on all nodes (!!!), almost the
same IO utilization, no changes in network traffic.
My first idea is Lucen's merge process, that should run in background
without influencing overall system performance.
From performance measurements provided by other users on this mail
list it seems that 100 tps is not too high value.
I have two questions:
- What is the reason of temporary slowdown, and how to investigate
it. - What is the reason for slow performance (on my opinion). Is it
possible to get 1000 tps for node, sustained rate.
Please advice
Thanks,
Vadim