ELK Stack performance optimization & elasticsearch heap size

Hi

Sorry for posting three questions in one topic, but they're maybe related somehow.

I've a running an elk stack, standalone server (vm).
32GB Ram, 16 CPU Cores, ES 2.0, Logstash 2.0

I have some question about performance and storage.

1st: ES Memory Usage
Today I had to remove some logstash indices, because ES crashed with a OutOfMemory Error.
JAVA Heap Size was configured with a size of 8G. Total Server memory 16G. The I had increased the total memory to 32G and JAVA_HEAP_SIZE to 16G, ES still crashed with OutOfMemory error.

Number of Logstash indices:

drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 17 09:17 logstash-2015.11.06
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 16 15:03 logstash-2015.11.16
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 17 01:00 logstash-2015.11.17
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 18 01:00 logstash-2015.11.18
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 19 01:00 logstash-2015.11.19
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 20 00:59 logstash-2015.11.20
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 21 00:59 logstash-2015.11.21
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 22 00:59 logstash-2015.11.22
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 23 00:59 logstash-2015.11.23
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 24 00:59 logstash-2015.11.24
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 25 00:59 logstash-2015.11.25
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 26 00:59 logstash-2015.11.26
drwxr-xr-x 8 elasticsearch elasticsearch 4096 Nov 27 00:59 logstash-2015.11.27

I have removed some older logstash indices and restarted ES with a total of 4 existing logstash indices.
After that everything turned back to normal.

So my question, does ES somehow try to load all indices into memory? Or whats the reason for the outOfMemory Error? Can I somehow disable old indicies instead delete them?

One ES logstash index is about 40G.

2nd: Logstash performance
We get like 600-1200 logstash events per second.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                            
 1651 logstash  39  19 26.012g 0.978g  14804 S 965.9  3.1 442:44.56 java                                                                               
 8472 elastic+  20   0 72.601g 7.232g 339852 S  52.4 23.0  18:11.63 java                                                                               

top shows a CPU consume from 700% to 1400% for the logstash process, don't know if this is a problem? or just a good usage of the 16 CPU Cores.

But the big problem is, if the ELK stack is stocking (doesn't matter what the problem is), other server slows
down or/and don't answer client requests. But these are all common syslog problems (I had the same problems with a standalone rsyslog central server.).
We send all events with syslog directly to logstash using tcp.
Should we switch to UDP(syslog), or transport logs otherwise?

3rd: ES Storage
Currently the ES datastore is located on an iSCSI LUN instead on a FibreChannel LUN. Because if we reach 30 Logstash indices, this would be about 30*40G disk space, which is pretty expensive with SAN storage.
(The performance with my iSCSI LUN is not that bad, but surely not as good as the SAN performance)

Is it possible to store, for example the three newest indices on a fibre LUN, and all older ones on a iSCSI LUN?

Thanks

You should really use the APIs for looking at indices, rather than an ls on disk. I hope you used them for deleting the indices at least!

No it does. Too much data(?), it's hard to say without seeing the error. You can _close them.

I'd say the latter. Depends on your config really.

Stocking?

Only if you have multiple nodes and use allocation awareness.

Hi,

Sorry for my poor English but I want to share my experience about it .

Recently I have a 3 days training in enterprise about Es only. The most important thing I notice is about job mastering and sharing between nodes and cluster.

It seems you do the same error as us to consider elk as a standalone app that is not.

You have to flag each master node without data storing, expose each master x 2 node for data store and 2 load balancers for example. This whole stack separate on 3 server if possible.

I wish this could help you

Best regards

1 Like