ERR Fail to publish event to REDIS: write tcp sourceHost:sourcePort->redisHost:redisPort: i/o timeout


#1

I have recently setup below ELK stack for one of my application:
FileBeat --> Redis --> Logstash --> ElasticSearch --> Kibana
My FileBeat config:

filebeat:
  prospectors:
    -
      paths:
        - LogPath
      input_type: log
      tail_files: true
output:
   redis:
     host: "<host>"
     port: port
     save_topology: true
     index: "filebeat"
     db: 0
     db_topology: 1
     timeout: 5
     reconnect_interval: 1
shipper:
logging:
  to_files: true
  files:
    path: /tmp
    name: mybeat.log
  level: error

My redis config:
> bind host
> port port
> tcp-backlog 511
> timeout 0
> tcp-keepalive 0
> daemonize no
> supervised no
> pidfile /var/run/redis.pid
> loglevel warning
> logfile "logpath"
> databases 16
> stop-writes-on-bgsave-error yes
> rdbcompression yes
> rdbchecksum yes
> dbfilename dump.rdb
> dir ./
> slave-serve-stale-data yes
> slave-read-only yes
> repl-diskless-sync no
> repl-diskless-sync-delay 5
> repl-disable-tcp-nodelay no
> slave-priority 100
> maxmemory 2GB
> maxmemory-policy volatile-lru
> appendonly no
> appendfilename "appendonly.aof"
> appendfsync everysec
> no-appendfsync-on-rewrite no
> auto-aof-rewrite-percentage 100
> auto-aof-rewrite-min-size 64mb
> aof-load-truncated yes
> lua-time-limit 5000
> slowlog-log-slower-than 10000
> slowlog-max-len 128
> latency-monitor-threshold 0
> notify-keyspace-events ""
> hash-max-ziplist-entries 512
> hash-max-ziplist-value 64
> list-max-ziplist-size -2
> list-compress-depth 0
> set-max-intset-entries 512
> zset-max-ziplist-entries 128
> zset-max-ziplist-value 64
> hll-sparse-max-bytes 3000
> activerehashing yes
> client-output-buffer-limit normal 0 0 0
> client-output-buffer-limit slave 256mb 64mb 60
> client-output-buffer-limit pubsub 32mb 8mb 60
> hz 10
> aof-rewrite-incremental-fsync yes

I am getting below error on FileBeat logs:

ERR Fail to publish event to REDIS: write tcp sourceHost:sourePort->redisHost:redisPort: i/o timeout


(Steffen Siering) #2

Which filebeat version are you using. It's not recommended to use filebeat 1.x releases with redis (due to potential data loss). The redis output has been completely rewritten in 5.0alpha3 giving you features like:

  • proper error handling and error recovery
  • load balancing support
  • support for redis over SSL if redis is run behind stunnel for example
  • SOCKS5 proxy support

have you checked with netstat on filebeat and redis machine if filebeat is actually connected? Any firewall in between?

Have you tried to increase the timeout? Timeout of 5 seconds you configured is already hughe.


#3

Hi Steffens..Thanks for your reply mate..
I have tried with all 3 versions. i.e 1.2.2 , 1.2.3 and 5.0.alpha4. Below is my observation:
1.2.2 & 1.2.3 same result i.e they are able to connect to my redis server and send log events but intermittently I see this error in the beats.log:

ERR Fail to publish event to REDIS: write tcp sourceHost:sourcePort->redisHost:redisPort: i/o timeout

When I tried with 5.x version of FileBeat for some reason it is not able to connect to redis server (although all the details remain same)
Below is my 5.x filebeat.yml

filebeat.prospectors:

  • input_type: log
    paths:
    • logPath
      multiline.pattern: "^[0-9]:[0-9]:[0-9]*,[0-9]?[0-9]?[0-9]?"
      multiline.negate: true
      multiline.match: after
      tail_files: false

output.redis:
hosts: "hostname"
port: 6379
index: "filebeat"
db: 0
db_topology: 1
timeout: 5s
max_retries: 3
logging.level: info
logging.to_files: true
logging.files:
path: /tmp
name: mybeat.log

Log output from 5.x version of Filebeat as belows (set at INFO level):

2016-07-01T21:59:50+08:00 INFO Home path: [filebeathome] Config path: [filebeathome] Data path: [filebeathome/data] Logs path: [filebeathome/logs]
2016-07-01T21:59:50+08:00 INFO Setup Beat: filebeat; Version: 5.0.0-alpha4
2016-07-01T21:59:50+08:00 INFO Max Retries set to: 3
2016-07-01T21:59:50+08:00 INFO Activated redis as output plugin.
2016-07-01T21:59:50+08:00 INFO Publisher name: publisher_name
2016-07-01T21:59:50+08:00 INFO Flush Interval set to: 1s
2016-07-01T21:59:50+08:00 INFO Max Bulk Size set to: 2048
2016-07-01T21:59:50+08:00 INFO filebeat start running.
2016-07-01T21:59:50+08:00 INFO Registry file set to: filebeathome/data/registry
2016-07-01T21:59:50+08:00 INFO No registry file found under: filebeathome/data/registry. Creating a new registry file.
2016-07-01T21:59:50+08:00 INFO Loading Prospectors: 1
2016-07-01T21:59:50+08:00 INFO Load previous states from registry into memory
2016-07-01T21:59:50+08:00 INFO Previous states loaded: 0
2016-07-01T21:59:50+08:00 INFO Starting Registrar
2016-07-01T21:59:50+08:00 INFO Loading Prospectors completed. Number of prospectors: 1
2016-07-01T21:59:50+08:00 INFO Start sending events to output
2016-07-01T21:59:50+08:00 INFO All prospectors are initialised and running with 0 states to persist
2016-07-01T21:59:50+08:00 INFO Starting prospector of type: log
2016-07-01T21:59:50+08:00 INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2016-07-01T22:00:00+08:00 INFO Run prospector
2016-07-01T22:00:10+08:00 INFO Run prospector
2016-07-01T22:00:20+08:00 INFO Run prospector
.
.
.
.
.

In case of 5.x none of the log events reach redis server, but in case of 1.2.2 & 1.2.3 atleast most of them are reaching but intermittently i am getting that error.
Can you tell me what can i do in this regards
Your help is much appreciated
Thanks in Advance
Regards
Mrunal


(Steffen Siering) #4

In 5.0alpha4 the connections are established lazily (after first event is going to be pushed), in order to not block startup (e.g. init scripts) on startup if network is unresponsive. I don't see any events being pushed yet, so no attempt to connect to redis is made.

Due to new redis output supporting load-balancing, the output.redis.hosts requires you to use an array:

output.redis:
  hosts: ["hostname:port"]
  save_topology: true
  db_topology: 1

Have you tried to telnet you redis server? e.g.

$ telnet host port
INFO
...

redis being mixed of test/binary protocol you can execute some commands like INFO right via telnet.


#5

HI Steffen
As you suggested I did specify the host port in filebeat's config file and started filebeat..But still same issue nothing being sent over to redis server. I also tried doing telnet host port to redis server host and port from filebeat host and issued INFO command and below is the output:

INFO

Server

redis_version:3.2.0
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:926e4c00a759e20e
redis_mode:standalone
os:Linux
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.4
process_id:
run_id:
tcp_port:6379
uptime_in_seconds:593
uptime_in_days:0
hz:10
lru_clock:
executable:
config_file:

Clients

connected_clients:2
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

Memory

used_memory:898080
used_memory_human:877.03K
used_memory_rss:2396160
used_memory_rss_human:2.29M
used_memory_peak:898080
used_memory_peak_human:877.03K
total_system_memory:33669894144
total_system_memory_human:31.36G
used_memory_lua:32768
used_memory_lua_human:32.00K
maxmemory:2147483648
maxmemory_human:2.00G
maxmemory_policy:volatile-lru
mem_fragmentation_ratio:2.67
mem_allocator:libc

Persistence

loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1467703634
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

Stats

total_connections_received:2
total_commands_processed:6913
instantaneous_ops_per_sec:11
total_net_input_bytes:216805
total_net_output_bytes:9263
instantaneous_input_kbps:0.34
instantaneous_output_kbps:0.01
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:2304
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0

Replication

role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

CPU

used_cpu_sys:0.12
used_cpu_user:0.18
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

Cluster

cluster_enabled:0

Keyspace

Also the fact that filebeat 1.2.2 and 1.2.3 version are able to send log events to redis server (with intermittent i/o timeout issues) tells me that there is nothing wrong with connectivity between 1.x versions of filebeat with redis but something is wrong with the 5.0.alpha4 version. Can you tell me what could be done in the 1.x version itself to resolve i/o timeout issue. I see max_retries option in 5.x version of filebeat which suggests that filebeat will try to send failed log events thrice..is this supported in 1.x version?

Regards
Mrunal


(Steffen Siering) #6

Well, you can always try to increase the timeout, but I can not recommend using filebeat 1.x with redis output. In first place, I wonder why requests do timeout.

Do you have some logs (in debug mode) with 5.x release?


#7

Hi Steffen
In my view instead of timeout if we had a feature of retry N times after every N milliSeconds (pooling) then that should resolve this case..I wonder if such a thing is available or not? Also in my second post / reply I have added logs from 5.0.alpha4 (DEBUG) and as you could see there it is not able to send any log events it just keeps crying Running prospectors..Same details when added 1.2.3 version does the job..but there is only that intermittent issue of i/o timeout. I am not sure why it times out.
For now I guess I will have to go ahead with this issue as is..until you have any concrete solution to resolve this.


(Steffen Siering) #8

I can not tell if 1.2.3 did work correctly. 1.2.3 does not check the redis response or for timeout errors and therefore will not retry.

This is exactly what 5.0alpha4 is supposed todo.

If all you see is messages from prosepector, I wonder if there are any files to be harvested as harvesters and spooler should get you some messages too if debug log is enabled. Did you delete the registry files between runs?

Some network trace between filebeat-5.0alpha4 and redis would be interesting to see what's happening on network level: $ tcpdump -w trace.pcap tcp port 6379. Are redis commands send correctly. Is there a response from redis?


#9

Hi Steffen
For testing purpose I have kept filebeat 1.2.3 running now for more than 6 days and I can see messages being sent over to the redis instance and all the data being collected..But with the intermittent issue of i/o timeout. On the other hand, on a parallel environment doing same thing I have setup 5.0.alpha4 filebeat setup and yet nothing has been sent over to redis instance (Note: I have 2 ELK stack setup one running filebeat 1.2.3 and other one running 5.0.alpha4.). I did try the commands you suggested earlier and things look ok for both the setup. Its just that when filebeat 5.0.aplha4 runs with a similar config nothing is being sent over to redis.So my guess is there is something wrong on that version.
On the other hand can u tell me if I can run 5.0.alpha4 in debug mode with source code to see complete flow and understand where the break is?


(Steffen Siering) #10

Which commands exactly have you run?

Can you share filebeat 5.0alpha4 config?

Have you delete registry file before starting filebeat?

Any log output from filebeat?

Have you created a network trace as requested?

Debug log for redis can be enabled by adding the logging selector 'redis', either to your config file (logging section) or command line flag -d 'redis'. In case filebeat tries to publish any events you will see a 'connect' message from redis + when enabling INFO mode (e.g. -v on command line) some state update from filebeat:

2016/07/13 12:32:17.048831 publish.go:111: INFO Events sent: 2048
2016/07/13 12:32:17.050219 registrar.go:237: INFO Registry file updated. 1 states written.

We're having some automated (well, still minimal) integration tests for redis + I just successfully tested redis output with alpha4.

My filebeat config:

filebeat.prospectors:
- input_type: log
  paths:
    - tmp/nasa/*.log

output.redis:
  hosts: ["localhost"]
  index: "test"
  db: 0
  datatype: "list"

No need to configure anything-topology in filebeat redis output, as this is a packetbeat only feature.

I tested with NASA HTTP access log: http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html

My test system has redis 3.0.5 installed. Need to test manually with more recent release, but integration tests use redis 3.2.0.

My corresponding logstash config:

input {
    redis {
        host => "localhost"
        key => "test"
        data_type => "list"
        db => 0
    }
}

output {
    stdout {
        codec => "dots"
    }
}

This config prints one dot per received event. Remove or comment out the codec field in stdout to print the message field to standard out.

Running filebeat + redis + logstash with these configs happily spams my console until all logs have been processed.


#11

Hi Steffen
I think I now know what is the issue here..5.x filebeat version cannot support symlink..So in our current setup a symlink points to the latest log file and in 1.2.3 i was specifying that symlink path for sourcing logs..so i was repeatedly copying the same path..but it occurred to my mind to run this in debug mode and specify only one log file without symlink and 5.x version started running.. so the issue with 5.x version is it is skipping the path if its a symlink..which in my case it will always be
Log excerpts:

2016-07-18T07:12:40-04:00 DBG File <symlink_path> skipped as it is a symlink.
2016-07-18T07:12:40-04:00 DBG File <symlink_path> skipped as it is a symlink.
2016-07-18T07:12:40-04:00 DBG File <symlink_path> skipped as it is a symlink.
2016-07-18T07:12:40-04:00 DBG File <symlink_path> skipped as it is a symlink.
2016-07-18T07:12:40-04:00 DBG File <symlink_path> skipped as it is a symlink.
2016-07-18T07:12:40-04:00 DBG File <symlink_path> skipped as it is a symlink.
2016-07-18T07:12:40-04:00 DBG File <symlink_path> skipped as it is a symlink.
2016-07-18T07:12:40-04:00 DBG File <symlink_path> skipped as it is a symlink.

..so i have hit a dead end i guess here..any suggestions or workarounds?

Regards
Mrunal


(Steffen Siering) #12

this is the issue for removing symlink support in filebeat. I understand symlinks being somewhat tricky at times, leading to additional edge cases in filebeat. As far as I can tell, there is currently no option to enable symlink support. Workarounds might include hard-links or copying files to be forwarded.


(ruflin) #13

@mrunalgosar Can you share some more details on how it works with your symlink? It always helps to understand the use case before we implement a potential flag to enable symlinks again.


#14

@ruflin We have a background process which rolls over the log files based on either day has ended / file size has exceeded certain limit or a restart of application is triggered..and then all the log events are sent to new log files. so we have a symlink which always points to the latest log file created..which is why i had specified that in "paths" property of filebeat. If symlink feature is removed from filebeat 5.x then we will not be able to upgrade ourselves anytime from 1.x to 5.x. Can this feature be made configurable and leave it to the end user to enable it or disable it?

Regards
Mrunal


#15

@steffens It seems hardlink is not possible for us as we have a background process which switches symlinks to latest log file when ever log files roll over. so i suggest the point mentioned in that issue that we keep symlinks feature as configurable.

Regards
Mrunal


(Steffen Siering) #16

why you need symlinks? Why not point filebeat at location files are placed into? filebeat detects files being renamed, continues renamed file until fully processed and continues with new file. Using symlinks here, I'm not sure events are either lost, or ingestion of new-file is deferred due to previous log file still being processed.


#17

@steffens Symlinks comes handy, in that we don't have to know where latest log file is to check..all the log files are placed inside same directory so its quite difficult to look for latest log file..so a symlink helps us.


(ruflin) #18

You write "to find the most recent log file": Is that for you manually or for filebeat? Because filebeat would do that automatically.


#19

For us manually as well as how would filebeat come to know automatically? there are several log files in the same folder with the same extension and name pattern with slight difference in the time stamp prefixed in their names. I would suggest that we keep symlinks feature as configurable as against completely removing it.


(ruflin) #20

Filebeat does the detection automatically as it doesn't track files by name but by inode and device id. It seems like removing the symlink would only make your manual look harder. At the same time I would hope that you don't have to touch the config files manually anymore as you have all the data in elasticsearch.

For the config option: Please open a feature request for it on Github and link to this issue: https://github.com/elastic/beats/issues/1686 Most efficient way to get it in would be a PR :wink: