Filebeat write to Redis so fast that Logstash couldn't keep up reading from Redis


#1

Hi all,

my logstash is 2.3.4
elasticsearch is 2.3.4
filebeat is 1.2.3
redis is 3.2.0
my os is CentOS

my situation is as title said, big logs wrote from Filebeat to Redis, just one Filebeat process out there doing the job.
Below is my Filebeat config file:
filebeat: sppol_size: 8196 publish_async: true prospectors: - paths: - /data/behavior/*/*/*/*/*/info.log input_type: log tail_files: true ignore_older: 17h scan_frequency: 10s output: redis: host: "10.11.147.18" port: 6379 save_topology: true index: "filebeat" db: 0 db_topology: 1 timeout: 5 reconnect_interval: 1
my logstash config is:
input { redis { data_type => "list" key => "filebeat" host => "10.11.147.18" port => 6379 threads => 20 } } output { elasticsearch { index => "logstash-sso-%{+YYYY.MM.dd}" hosts => ["10.148.16.174","10.110.80.71"] } }

It still can't deal fast enough even i start 3 Logstash processes..
is there any inappropriate config leading this situation?
pls give my some tips~

thank you all~


(Magnus Bäck) #2

What's the event rate? Don't you have any filters in your configuration? Have you tried raising the redis input's batch_count option?


#3

@magnusbaeck thank you for reply~

event rate ? how can i measure it ?

filters don't need yet, i just use it for test so far..

default value of batch_count is 50 (does this mean 50 logs a batch?? ), after you tip, i set it to 10000 and start 3 logstash processes to deal with redis..

it works ... seems worked at the very beginning, but when they run about 5 mins then output red error "Attempted to send a bulk request to Elasticsearch configured at". and stop consuming logs in redis, because i see the lengths of list is no longer reducing.


(Magnus Bäck) #4

event rate ? how can i measure it ?

Well, you could e.g. shut down Logstash and measure the increase of events in the Redis list.

default value of batch_count is 50

Really? All Logstash versions I checked had 1 (one) as the default batch_count.

(does this mean 50 logs a batch?? ),

It's the number of events fetched per request to Redis.

it works ... seems worked at the very beginning, but when they run about 5 mins then output red error "Attempted to send a bulk request to Elasticsearch configured at". and stop consuming logs in redis, because i see the lengths of list is no longer reducing.

What's full message in the Logstash log? Is there anything interesting in the Elasticsearch logs?


#5

@magnusbaeck
ah~ sorry i made a mistake that batch_event is 50 as default.

sorry for my delay...just give some try and observation.
every logstash log just 2 same err records:
{:timestamp=>"2016-08-15T15:51:36.306000+0800", :message=>"Pipeline main started"} {:timestamp=>"2016-08-15T15:53:08.140000+0800", :message=>"Attempted to send a bulk request to Elasticsearch configured at '[\"http://10.148.16.174:9200\", \"http://10.110.80.71:9200\"]', but Elasticsearch appears to be unreachable or down!", :error_message=>"Connection refused", :class=>"Manticore::SocketException", :level=>:error}
but there is no influence on elasticsearch cluster as the index size still increasing normally..seems
and elasticsearch has no err logs.

and the situation right now still:

it works ... seems worked at the very beginning, but when they run about 5 mins then output red error "Attempted to send a bulk request to Elasticsearch configured at". and stop consuming logs in redis, because i see the lengths of list is no longer reducing.

as if, as the time go, logstash could not eat as much as beginning.


#6

127.0.0.1:6379> llen sso.log
(integer) 0
127.0.0.1:6379> llen sso.log
(integer) 3073
127.0.0.1:6379> llen sso.log
(integer) 33
127.0.0.1:6379> llen sso.log
(integer) 34
127.0.0.1:6379> llen sso.log
(integer) 15218
127.0.0.1:6379> llen sso.log
(integer) 7895
127.0.0.1:6379> llen sso.log
(integer) 24786
127.0.0.1:6379> llen sso.log
(integer) 16157
127.0.0.1:6379> llen sso.log
(integer) 5260
127.0.0.1:6379> llen sso.log
(integer) 35
127.0.0.1:6379> llen sso.log
(integer) 4121
127.0.0.1:6379> llen sso.log
(integer) 17609
127.0.0.1:6379> llen sso.log
(integer) 27081
127.0.0.1:6379> llen sso.log
(integer) 12611
127.0.0.1:6379> llen sso.log
(integer) 7473
127.0.0.1:6379> llen sso.log
(integer) 19965
127.0.0.1:6379> llen sso.log
(integer) 13253
127.0.0.1:6379> llen sso.log
(integer) 23950
127.0.0.1:6379> llen sso.log
(integer) 49332
127.0.0.1:6379> llen sso.log
(integer) 22656
127.0.0.1:6379> llen sso.log
(integer) 0

it seems done well..
as time goes...it shows some trends that tell me "i am almost full.."

127.0.0.1:6379> llen sso.log
(integer) 176888
127.0.0.1:6379> llen sso.log
(integer) 265738
127.0.0.1:6379> llen sso.log
(integer) 281719
(4.34s)
127.0.0.1:6379> llen sso.log
(integer) 216504
127.0.0.1:6379> llen sso.log
(integer) 244908
127.0.0.1:6379> llen sso.log
(integer) 273645
127.0.0.1:6379> llen sso.log
(integer) 298234
127.0.0.1:6379> llen sso.log
(integer) 323000
127.0.0.1:6379> llen sso.log
(integer) 361920
127.0.0.1:6379> llen sso.log
(integer) 385123
127.0.0.1:6379> llen sso.log
(integer) 302905

127.0.0.1:6379> llen sso.log
(integer) 1177891
127.0.0.1:6379> llen sso.log
(integer) 1194948
127.0.0.1:6379> llen sso.log
(integer) 1216811
127.0.0.1:6379> llen sso.log
(integer) 1241705
127.0.0.1:6379> llen sso.log
(integer) 1262131
127.0.0.1:6379> llen sso.log
(integer) 1283168
127.0.0.1:6379> llen sso.log
(integer) 1302806
127.0.0.1:6379> llen sso.log
(integer) 1339122
127.0.0.1:6379> llen sso.log
(integer) 1360796
127.0.0.1:6379> llen sso.log
(integer) 1162271
127.0.0.1:6379> llen sso.log
(integer) 1209606
127.0.0.1:6379> llen sso.log
(integer) 1233355
127.0.0.1:6379> llen sso.log
(integer) 1266795
127.0.0.1:6379> llen sso.log
(integer) 1289300

finally ...they...

127.0.0.1:6379> llen sso.log
(integer) 2025467
127.0.0.1:6379> llen sso.log
(integer) 2050439
127.0.0.1:6379> llen sso.log
(integer) 2070784
127.0.0.1:6379> llen sso.log
(integer) 2091043
127.0.0.1:6379> llen sso.log
(integer) 2115579

i closed filebeat stop feeding to redis...have to wait for logstashers digesting logs in redis...


#7

Hi @tanxiaolong
We faced a similar issue about half year ago.
The solution was to have several independent Redis nodes behind a loadbalancer.
First we used some cron and bash scripts, but then we switched to Metricbeat and added alerts on redis.info.memory.used.rss value. Alerts caused Redis instance to get out form the loadbalancer pool.
We performed some load tests to determine the static threshold above which Redis\Logstash performance start decreasing from its maximum.
In such way we assured maximum performance with minimal delays even under stress loads.
Hope you will find it useful,
D.


(system) #8