This works fine. I'm capturing the time required for the processing of the filter in the "elapsed" field, which is consistently around 0.001 to 0.002 seconds. However, now having tasted the sweet taste of Redis connectivity, I want it to be faster!
So, I was wondering if it is possible to use the Redis Ruby library to connect to a local UNIX socket file on the host machine's file system. I want to know if the Redis filter is sandboxed in some fashion. I'd prefer to connect to the socket file to avoid the RTT through the network stack. Even though loopback connections are very fast, there's nothing as fast as direct file I/O.
I suspect the overhead of setting up a connection for each event will overshadow any benefit of using a Unix socket, but to answer your question, no, there's no sandbox here.
I suspect you're right. I don't know of any way to create a persistent Redis connection, though. I'd love to be able to do that. Speed is what this game is all about. Thanks!
As it turns out, the local socket connection is faster, enough so that it's worth doing it that way as opposed to over the network. It's consistently below 1 msec now, usually around 0.0007 seconds. Here's the redesigned filter which doesn't cause a ruby exception if the Redis server can't be found.
filter {
# Test filter that just uses the Ruby filter to add a field to every record
ruby {
init => "require 'redis'; require 'time'"
code => 'start_time = Time.now
begin
#rc = Redis.new(host: "127.0.0.1", port: 6379, db: 3)
rc = Redis.new(path: "/tmp/redis.sock", port: 6379, db: 3)
event.set("redis_status", "OK")
event.set("redis_val", rc.hget("redis_test", "first_val"))
rescue
event.set("redis_status", "Cannot connect")
end
end_time = Time.now
event.set("redis_elapsed", end_time - start_time)'
}
}
It would be useful to be able to establish a persistent connection to the Redis instance to speed this up any further. Any chance of that? I imagine such a thing would be great for JDBC database connections as well.
It's also a good idea to specify a short connection_timeout value as a Redis.new parameter. I set mine to 0.0005 seconds, so the rescue block is invoked faster if I can't establish a connection. Otherwise the time spent in the filter is much longer, slowing down the whole pipeline. Better to fail fast.
I thought the init section was simply to define the requirements for the plugin to ensure they loaded first. Does the init section load only once when Logstash starts? If so, do the variables defined in it have global scope to all Ruby filters? That could do the job....
Very good. I will give it a try and let you know if it works, and if so, what effect (if any) it has on the performance of the filter. Thanks for the idea!
This worked, but I had to explicitly define the Redis connection variable in global scope with the $. See below. Vielen dank for the suggestion, Magnus!
filter {
# Test filter that just uses the Ruby filter to add a field to every record
ruby {
init => 'require "redis";
require "time";
$rc = Redis.new(path: "/tmp/redis.sock", port: 6379, db: 3)'
code => 'start_time = Time.now
begin
event.set("redis_status", "OK")
event.set("redis_val", $rc.hget("redis_test", "first_val"))
rescue
event.set("redis_status", "Cannot connect")
end
end_time = Time.now
event.set("redis_elapsed", end_time - start_time)'
}
}
One more note: it appears that the globals defined in ruby init filters are TRULY global. I created another filter similar to the one described above, except without the Redis.new declaration in the init block, and I was still able to access $rc from that filter.
So the caveat here is that you must make sure you define distinct global variables across all Ruby filters which use this technique for sharing connections.
Also, the time savings on the Redis connection from using a single shared Redis connection is extremely substantial. The average elapsed time is now occasionally as low as 0.049 msec, though it seems to hover around 0.07 msec most of the time.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.