I have the following setup on MAC running on i7 (8core machine).
Filebeat (reading local /tmp/perf/.. logs) --> Elastic single instance docker
I was only able to get ~9400 logs/sec indexed to Elastic and also the store.size for 200MB log data is 1.1GB, roughly 4.5x overhead. The configs are listed at the end. I am sure my configs are not optimal and would like to hear from the community how they tune the configs for performance and efficient storage in elastic (a trade off).
- 
How can I improve the throughput for this test case in dev environment ? . I have configured spool size, bulk_max_size, workers etc. (This is not production like config, but would like understand the constraints and perf on this setup, before I can scale it up with client/data/master nodes setup)
 - 
And how can I reduce the Elastic store size 1.1G i.e. 4.5x overhead for 200M log data.
 
Any pointers would really help.
Total Logs exported : Log size 200MB, (which has 2000000 log lines total)
[perf] $ cd /tmp/perf/ ; ls -ltr
total 460512
-rw-r--r--  1 xyz  wheel  117888896 Jul 28 16:02 nphal.log
-rw-r--r--  1 xyz  wheel  117888896 Jul 28 16:02 npagent.log
Time taken to index 2M log records (200MB in total size)
[perf] $ time watch curl http://127.0.0.1:9200/_cat/indices?v
real    3m32.463s
user    0m0.733s
sys    0m0.697s
Time to index 2M entries : 9433 logs/second
2M logs indexed in Elastic
[perf] $ curl http://127.0.0.1:9200/_cat/indices?v | grep agent
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed
100  1000  100  1000    0     0  69405      0 --:--:-- --:--:-- --:--:-- 71428
yellow open   agent.logs.2017.07.28        twGpthLpRZKe5-pGO0YlLw   5   1    2000000            0      1.1gb          1.1gb
Filebeat config
filebeat.prospectors:
- 
input_type: log
paths:- /tmp/perf/npagent.log
symlinks: true
scan_frequency: 500ms 
 - /tmp/perf/npagent.log
 - 
input_type: log
paths:- /tmp/perf/nphal.log
symlinks: true
scan_frequency: 500ms 
 - /tmp/perf/nphal.log
 
filebeat.spool_size: 65536
output.elasticsearch:
worker: 8
bulk_max_size: 4096
hosts: ["127.0.0.1:9200"]
index: "agent.logs.%{+YYYY.MM.dd}"
ELASTIC docker
services:
elastic:
image: docker.elastic.co/elasticsearch/elasticsearch:5.4.1
container_name: elastic
environment:
- ES_JAVA_OPTS=-Xms1g -Xmx1g -Xmn500m -XX:MaxMetaspaceSize=500m
mem_limit: 1g
ports:
- 9200:9200
I also enable memory and cpu profile for filebeat. Most of the time is spend in json encoder and runtime.malloc.
And mem profiling shows the following -
$ go tool pprof -inuse_space filebeat /tmp/perf/mem.txt
Entering interactive mode (type "help" for commands)
(pprof) list
Total: 37.37MB
# runtime.MemStats
# Alloc = 39052504
# TotalAlloc = 25032945592
# Sys = 610914448
# Lookups = 100
# Mallocs = 320146182
# Frees = 320135121
# HeapAlloc = 39052504
# HeapSys = 566919168
# HeapIdle = 525926400
# HeapInuse = 40992768
# HeapReleased = 0
# HeapObjects = 11061
# Stack = 1212416 / 1212416
# MSpan = 85280 / 9371648
# MCache = 9600 / 16384
# BuckHashSys = 1475739
# NextGC = 75925642