Hello!
I just started with the ELK stack few weeks ago and have spent time on the tutorial videos, hands-on, etc. I've elasticsearch-2.3.4 and logstash-2.3.4 on a Win7 PC (laptop) with 8GB RAM and 64-bit JDK1.8.
I'm trying a new way of storing my application logs (text files in custom format) and then analysing them with ELK - eventually all the application engineers will have access to the logs via the E or K interfaces. Just to try this out faster I'm initially playing with a 10MB apache log file with about 100,000 lines.
When I use logstash with the file as input and use the metrics filter and COMMONAPACHELOG grok filter and only output the metric to stdout (nothing else in the output), I get a rate_1m of around 3500.
When I add elasticsearch (running on same PC) to the output of logstash, the rate_1m drops to between 200 & 300. I tried configuring elasticsearch with mlockall, ES_HEAP_SIZE of 2G, 3G (4G failed to allocate) - the performance varies between 200 & 300 with no significant change with heap changes.
My questions:
-
In my opinion, 3500 is poor performance of logstash. A perl script achieves a per minute performance of 6,000,000 on my machine for a similar match pattern and no output. Is this the performance I can expect for all the extra functionality offered by logstash?
-
Assuming I'm OK with performance around 3500 for my application, 200 is just too much of a performance drop when I add elasticsearch to the mix. The PC is definitely not loaded when this is running, CPU hovers between 25 to 75% usage, total disk IO is below 10MB/s - the PC can easily do 30-40MB/s. So, system resources are not the bottleneck here. What am I doing wrong and how can I improve this rate?