Here is our setup :
Packbeat(v0.5) * 8 >> Redis (v3.0.5) << Logstash(v1.5.5) >> elasticsearch(v1.7.3)
We are very satisfy with this setup except we notice a memory leak on our redis server. Memory usage increases gradually every time, and finishes to reach the memory limit. When it's happening, we don't have any other choice to restart the redis server and all packetbeat agents.
Here is some graphs which show the activity on 6 hours.
Anybody encounter this issue ?
Are you sure it is a memory leak? With redis basically acting as a queue between packetbeat and logstash it is more likely your (single?) logstash instance being not able to handle all data in time.
Can you somehow find the rate with data being put into redis and the rate data are pulled from redis? E.g. redis INFO command returns total_net_input_bytes and total_net_output_bytes.
Have you considered upgrading packetbeat to v1.0.0 ? This version also support load balancing events onto multiple logstash instances.
The graph above "Redis - commands queued" shows the metric got with the command "LLEN packetbeat" and "Redis - tps" show the metric total_commands_processed per second returned by INFO. When redis-server is reaching memory limit (used_memory is reaching the redis setting maxmemory), the list is empty. We have tried some commands, like flushdb or flushall, but redis-serveur doesn't free memory.
Currently, I have checked used_memory, checked total_net_input_bytes and total_net_output_bytes :
"LLEN packetbeat" returns 0, and "KEYS *" returns "(empty list or set)". What does it mean, total_net_input_bytes should be equal to total_net_output_bytes if all data is consumed by the logstash redis input pluggin ? Why 1Go memory used and no data stored ?
Yes, we have planned to upgrade to 1.0, but I ask me if it could solve our issue.
The memory was used by client connections, not for the data.
I find out the root cause by reading the redis documentation http://redis.io/topics/clients
each packetbeat clients were eating all the memory in connection handling, ie :
id=305 addr=10.204.5.199:29916 fd=14 name= age=72991 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=2838444 omem=179613873 events=rw cmd=rpush
To fix my issue, I've set this parameter (the default value is unlimited !)
client-output-buffer-limit normal 32000000 0 0