I have the following setup on MAC running on i7 (8core machine).
Filebeat (reading local /tmp/perf/.. logs) --> Elastic single instance docker
I was only able to get ~9400 logs/sec indexed to Elastic and also the store.size for 200MB log data is 1.1GB, roughly 4.5x overhead. The configs are listed at the end. I am sure my configs are not optimal and would like to hear from the community how they tune the configs for performance and efficient storage in elastic (a trade off).
-
How can I improve the throughput for this test case in dev environment ? . I have configured spool size, bulk_max_size, workers etc. (This is not production like config, but would like understand the constraints and perf on this setup, before I can scale it up with client/data/master nodes setup)
-
And how can I reduce the Elastic store size 1.1G i.e. 4.5x overhead for 200M log data.
Any pointers would really help.
Total Logs exported : Log size 200MB, (which has 2000000 log lines total)
[perf] $ cd /tmp/perf/ ; ls -ltr
total 460512
-rw-r--r-- 1 xyz wheel 117888896 Jul 28 16:02 nphal.log
-rw-r--r-- 1 xyz wheel 117888896 Jul 28 16:02 npagent.log
Time taken to index 2M log records (200MB in total size)
[perf] $ time watch curl http://127.0.0.1:9200/_cat/indices?v
real 3m32.463s
user 0m0.733s
sys 0m0.697s
Time to index 2M entries : 9433 logs/second
2M logs indexed in Elastic
[perf] $ curl http://127.0.0.1:9200/_cat/indices?v | grep agent
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1000 100 1000 0 0 69405 0 --:--:-- --:--:-- --:--:-- 71428
yellow open agent.logs.2017.07.28 twGpthLpRZKe5-pGO0YlLw 5 1 2000000 0 1.1gb 1.1gb
Filebeat config
filebeat.prospectors:
-
input_type: log
paths:- /tmp/perf/npagent.log
symlinks: true
scan_frequency: 500ms
- /tmp/perf/npagent.log
-
input_type: log
paths:- /tmp/perf/nphal.log
symlinks: true
scan_frequency: 500ms
- /tmp/perf/nphal.log
filebeat.spool_size: 65536
output.elasticsearch:
worker: 8
bulk_max_size: 4096
hosts: ["127.0.0.1:9200"]
index: "agent.logs.%{+YYYY.MM.dd}"
ELASTIC docker
services:
elastic:
image: docker.elastic.co/elasticsearch/elasticsearch:5.4.1
container_name: elastic
environment:
- ES_JAVA_OPTS=-Xms1g -Xmx1g -Xmn500m -XX:MaxMetaspaceSize=500m
mem_limit: 1g
ports:
- 9200:9200
I also enable memory and cpu profile for filebeat. Most of the time is spend in json encoder and runtime.malloc.
And mem profiling shows the following -
$ go tool pprof -inuse_space filebeat /tmp/perf/mem.txt
Entering interactive mode (type "help" for commands)
(pprof) list
Total: 37.37MB
# runtime.MemStats
# Alloc = 39052504
# TotalAlloc = 25032945592
# Sys = 610914448
# Lookups = 100
# Mallocs = 320146182
# Frees = 320135121
# HeapAlloc = 39052504
# HeapSys = 566919168
# HeapIdle = 525926400
# HeapInuse = 40992768
# HeapReleased = 0
# HeapObjects = 11061
# Stack = 1212416 / 1212416
# MSpan = 85280 / 9371648
# MCache = 9600 / 16384
# BuckHashSys = 1475739
# NextGC = 75925642