Possible bug with redis output plugin

Hi,

I'm using redis output plugin in my logstash to extract keys from redis, preprocess it and return it to redis.

Logstash version are 5.1.1 and redis is 3.2.6.

I confgure my redis output plugin in logstsh as follows:

output {
        redis {
                host => "${LOGSTASH_REDIS_HOST}"
                port => "${LOGSTASH_REDIS_PORT}"
                data_type => "list"
                key => "specific_key"
                batch => true
                batch_events => 100
                batch_timeout => 10
                reconnect_interval => 30
        }
}

and input plugin;

input {

        redis {
                host => "${LOGSTASH_REDIS_HOST}"
                port => "${LOGSTASH_REDIS_PORT}"
                data_type => "list"
                key => "generic_key"
                batch_count => 100
        }
}

the configuration work fine while the main in generic_key exits keys over 100 elements.

The problem appears when in the generic_key there are less than 100 elements.

When that happens it seems that logstash is waiting to receive more items until the 100 you need, but you must enter some kind of internal loop, because in the end you just skip the timeout of the socket.

Attached traces;

[2017-02-03T08:29:36,950][WARN ][logstash.outputs.redis ] Failed to send backlog of events to Redis {:identity=>"default", :exception=>#<Redis::TimeoutError: Connection timed out>,....
[2017-02-03T08:29:48,267][WARN ][logstash.outputs.redis ] Failed to flush outgoing items {:outgoing_count=>46, :exception=>"Redis::TimeoutError",....
[2017-02-03T08:29:48,267][WARN ][logstash.outputs.redis ] Failed to send backlog of events to Redis {:identity=>"default", :exception=>#<Redis::TimeoutError: Connection timed out>, ....
[2017-02-03T08:29:59,582][WARN ][logstash.outputs.redis ] Failed to flush outgoing items {:outgoing_count=>46, :exception=>"Redis::TimeoutError",...

We have tested different combinations of batch_events and batch_timeout and all tests do not lead one to think that there is a bug at this point.

It is possible that we are wrong and it is a configuration error... we dont know.

In any case we have solved the problem by disabling batch sending of events, but at the cost of increasing the number of operations in redis and the resources of CPU and RAM that this implies ...

Any ideas?

I see the problem. You need a batch_timeout that is less than whatever the socket_timeout setting is.

We use a 3rd party library to handle the data transmission to and from Redis - it has a socket_timeout setting. I don't know if we explicitly set this when we create an instance of the 3rd party class.

Let me check.

In out system, udp and tcp socket timeout are 30 seconds.

Although what you say may be true, it does not correspond to what is happening.

In operation, our redis loads in a few minutes about 20000000 records.

The logstash agent correctly processes those 20 Million records without problems, either by ingesting them back into 10-in-10, 200 in 200, or 100K in 100K redis. The problem is that, when in the main key there are fewer items to send, the redis output plugin seems to be waiting for them to appear until they complete what is configured in batch_events

Thank you

It looks like our config setting of timeout is sent to the ruby Redis (3rd party) gem instance and if I am reading the code correctly this value is used everywhere a timeout can be set. The Redis gem creates subclasses of the standard ruby TCPSocket class and sets timeout in this subclass. Its quite difficult to trace whether the value used at this level of the code was derived from the value we passed in higher up in the call chain or is a default value - that is out value was ignored.

The only way at this minute is to try setting the redis output config batch_timeout to less than 5 seconds (our default timeout value) or explicitly set both...

batch_timeout => 5
timeout => 6
1 Like

seems that increasing timeout value to 10 system works.

Thx for support!

1 Like

Great.

Curious, what info guided your decision to use a value of 10 seconds for the buffer_timeout?

Hi guyboertje,

Better reviewing our ecosystem of agents (based on FileBeat) we realized that we had the variable multiline.max_lines = 1500, which in principle limits the number of lines of a composite trace to 1500.

Seeing that the timeout was jumping to us in logstash without using the batch configuration, we assumed that the failure occurred when trying to insert a single trace (a very large one).

We reduced the value multiline.max_lines to 500 and with the timeout to 5 seconds it worked correctly. Since we need to hunt for major traces, the only feasible solution is to give logstash more time to insert these traces back into redis, so we chose to multiply the timeout by 2, and from that comes the value 10 that We said.

Regards,

Very nice. Evidence based is good.

I'm glad the issue is fixed. Your timing was spot on because I was looking at the redis output code earlier for another question. :slight_smile:

Hi,

Finally the error was reproduce in our environament another time.

Searching into our logs, we discover this kind of traces:

[2017-02-07T08:18:29,862][WARN ][logstash.outputs.redis ] DEBUG [2017-02-06 17:21:02,288] - LlamadaWidPadre.callHost[36] Parametres crida - Entorno:VW cia:ALZ modHost:UL03CO00, :identity=>"default", :exception=>#<Redis::TimeoutError: Connection timed out>, :backtrace=>["base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/connection/ruby.rb:111:in _write_to_socket'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/connection/ruby.rb:105:in _write_to_socket'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/connection/ruby.rb:131:in write'", "org/jruby/RubyKernel.java:1479:in loop'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/connection/ruby.rb:130:in write'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/connection/ruby.rb:374:in write'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/client.rb:271:in write'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/client.rb:250:in io'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/client.rb:269:in write'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/client.rb:228:in process'", "org/jruby/RubyArray.java:1613:in each'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/client.rb:222:in process'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/client.rb:367:in ensure_connected'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/client.rb:221:in process'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/client.rb:306:in logging'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/client.rb:220:in process'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis/client.rb:120:in call'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis.rb:1070:in rpush'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis.rb:58:in synchronize'", "base_path/tools/logstash/vendor/jruby/lib/ruby/1.9/monitor.rb:211:in mon_synchronize'", "base_path/tools/logstash/vendor/jruby/lib/ruby/1.9/monitor.rb:210:in mon_synchronize'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis.rb:58:in synchronize'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/redis-3.3.2/lib/redis.rb:1069:in rpush'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-redis-3.0.3/lib/logstash/outputs/redis.rb:244:in send_to_redis'", "org/jruby/RubyProc.java:281:in call'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json-3.0.2/lib/logstash/codecs/json.rb:42:in encode'", "base_path/tools/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-redis-3.0.3/lib/logstash/outputs/redis.rb:150:in receive'", "base_path/tools/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in multi_receive'", "org/jruby/RubyArray.java:1613:in each'", "base_path/tools/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in multi_receive'", "base_path/tools/logstash/logstash-core/lib/logstash/output_delegator_strategies/legacy.rb:19:in multi_receive'", "base_path/tools/logstash/logstash-core/lib/logstash/output_delegator.rb:42:in multi_receive'", "base_path/tools/logstash/logstash-core/lib/logstash/pipeline.rb:331:in output_batch'", "org/jruby/RubyHash.java:1342:in each'", "base_path/tools/logstash/logstash-core/lib/logstash/pipeline.rb:330:in output_batch'", "base_path/tools/logstash/logstash-core/lib/logstash/pipeline.rb:288:in worker_loop'", "base_path/tools/logstash/logstash-core/lib/logstash/pipeline.rb:258:in start_workers'"]}`

Isn't visible directly and for this reason i attach the next picture:

The real problem was that our logstash agent generates an infinite loop when trace has this kind of characters [NULNULNULNULNULNULNULNULNULNUL.....].

Our filter section is:

> filter {
>         if [message] =~ /(INFO|ERROR|DEBUG|WARN|FATAL|TRACE|SEVERE|NOTICE)(\s*\[20)/ {
>                 grok {
>                         patterns_dir => ["${LOGSTASH_PATH_GROK_PATTERNS}"]
>                         match => { "message" => "%{LOGLEVEL:loglevel}(\s*)\[?%{TIMESTAMP_ISO8601:mytimestamp}" }
>                 }
>         } else if [message] =~ /(INFO|ERROR|DEBUG|WARN|FATAL|TRACE|SEVERE|NOTICE)(\s*\[+[a-zA-Z]+)/ {
>                 grok {
>                         patterns_dir => ["${LOGSTASH_PATH_GROK_PATTERNS}"]
>                         match => { "message" => "%{LOGLEVEL:loglevel}(\s*)(\[)?(\[(.*?)\])+(\])?(\s*)\[%{TIMESTAMP_ISO8601:mytimestamp}" }
>                 }
>         } else if [message] =~ /\d{4}-\d{2}-\d{2}/ {
>                 grok {
>                         patterns_dir => ["${LOGSTASH_PATH_GROK_PATTERNS}"]
>                         match => { "message" => "%{TIMESTAMP_ISO8601:mytimestamp}" }
>                 }
>         }
>         date {
>                 match => [ "mytimestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
>                 locale => "en"
>         }
>         ruby {
>                 code => "event.set('lag_seconds', Time.now.to_f - event.get('@timestamp').to_f)"
>         }
>         if [lag_seconds] > 1296000 {
>                 drop { }
>         }
>         mutate {
>                 remove_field => [ "mytimestamp", "type"]
>         }
> }

is for this reason that socket timeout exception appears, because this sockets remains open and logstash (trapped in infinite loop) dont write into it.

Some ideas?

Regards

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.