Weird Filebeat init script issue


(JD) #1

I'm trying to get filebeat to install via Saltstack, or even just a remote bash script over ssh, which seems to work great, up until I need to restart the filebeat service. With either method, the script just hangs executing the service filebeat start command.

At first, I thought this was some kind of weird issue with Salt, bash, or ssh, but after some additional testing, it seems to be a strange issue with the 'filebeat-god' executable. If I extract out the command from the init script and run it from the command line, like so:

start-stop-daemon --start --pidfile /var/run/filebeat.pid --exec /usr/bin/filebeat-god -- -r / -n -p /var/run/filebeat.pid -- /usr/bin/filebeat -c /etc/filebeat/filebeat.yml'

It works fine. If I put it in a bash script and run it locally, it works fine. Same thing happens if I just use the service filebeat start command from a bash script. But as soon as I try to run the command from the Salt master, or via ssh remotely, it just hangs. Seems like simple thing, and I know it seems a little crazy, but I haven't been able to figure it out.

Has anyone else seen this problem?


(ruflin) #2

@jderieg Could you share the log messages from filebeat?


(Dan Garthwaite) #3

tl;dr use_vt: True fixes this.

I have no logfile to share. Here is a partial process tree:

root     14887  0.0  0.2  63668  2104 pts/2    S    19:21   0:00              \_ sudo salt-call --local service.start filebeat -l de
root     14888  0.4  5.1 302900 52600 pts/2    S    19:21   0:00              |   \_ /usr/bin/python /usr/bin/salt-call --local serv
root     14915  0.0  0.0      0     0 pts/2    Z    19:21   0:00              |       \_ [filebeat] <defunct>

lsof -p 14915 shows nothing

There is a new-ish option "use_vt" which seems to fix the issue in states. You can not specify use_vt when calling the service module from the command line.


(ruflin) #4

@DanG Can you share some more details on this? So if use_vt is set to true filebeat starts as expected? Does this mean this is more a saltsack issue of something that should be fixed on the filebeat side?


(Dan Garthwaite) #5

I couldn't say. The use_vt flag in salt was added to deal with badly
behaving scripts. It is strange behavior.


(Perlchild) #6

If the init script uses sudo and sudo has requiretty(which iirc is the default) it would explain this


(JD) #7

Thanks for all the help guys! The use_vt flag in salt worked.


(JD) #8

ugh. Unfortunately, the 'use_vt' flag just exited out of the script, and the 'service filebeat start' command still hangs and never actually finishes. Back to square one.


(JD) #9

Finally got it working like this:
sudo service filebeat start > /dev/null 2>&1 &


(ruflin) #10

@jderieg Thanks for the update. Did you find out what the issue is / was?


(JD) #11

Not really certain. The 'filebeat-god' executable that the init file runs (using the start-stop-deamon) is a binary, so I'm still not sure what's causing it to hang on remote execution without the additional output settings above.


(Phil) #12

I appear to be having a similar issue. If I run the init.d script by hand (/etc/init.d/filebeat start), I see filebeat-god and filebeat start up properly.

If I issue service filebeat start, it does not work. If I construct and use a systemd .service file, it does not work.

The hardest thing is that nothing leaves a single log message anywhere about what the problem is.

Can anyone suggest anything?


(Frenkye) #13

Hi,

I have exactly same problem with salt. If service isn't running salt is unable to start it. Here is my strace:


read(12, "s were set und", 14) = 14 fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(12, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(12, "er output.geoip.", 16) = 16 fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(12, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(12, "paths\n", 18) = 6 read(12, "2015/12/04 1", 12) = 12 fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(12, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(12, "6:25:34.998509 outpu", 20) = 20 fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(12, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(12, "ts.go:111: INFO Activat", 23) = 23 fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(12, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(12, "ed logstash as output plug", 26) = 26 fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(12, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(12, "in.\n2015/12/04 16:25:34.99854", 29) = 29 fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(12, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(12, "7 publish.go:249: INFO Publisher"..., 33) = 33 fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(12, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(12, "name: frenkye-salt-3\n2015/12/04 "..., 37) = 37 fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(12, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(12, ":34.998673 beat.go:107: INFO Ini"..., 41) = 41 fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(12, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(12, "ilebeat; Version: 1.0.0\n", 46) = 24 read(12, 0x7fc59caf9eb3, 22) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3623, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- read(12,

For me I'am pretty much begginer at debugging and what I notice is that last "read(12," should not be there, because it from my point of view and other service there follow close() and not read(). Also the SIGCHLD correspond to Zombie process:


root 3623 0.0 0.0 0 0 pts/0 Z+ 17:25 0:00 [filebeat]
root 3636 0.0 0.0 9400 576 pts/0 Sl+ 17:25 0:00 /usr/bin/filebeat-god -r / -n -p /var/run/filebeat.pid -- /usr/bin/filebeat -c /etc/filebeat/filebeat.yml
root 3637 4.0 1.2 224840 26504 pts/0 Sl+ 17:25 0:01 /usr/bin/filebeat -c /etc/filebeat/filebeat.yml

I don't know if I'am right anyone experienced who could check this out?


(ruflin) #14

There is now also a Github issue which seems to be related to this: https://github.com/elastic/filebeat/issues/309


(Steffen Siering) #15

@frenkye

tl;dr strace unrelated to startup scripts.

The strace seems incomplete. I'm basically missing the 'open' call so we can see which file is being opened.

But st_mode=S_FIFO is telling me the file actually being no seekable file on a block-device, but a FIFO/pipe (which is not seekable, create with mkfifo?). Given the fact it's using file descriptor 12 I wonder about your filebeat config and the process writing to this file. And if you try to forward some content directly from one process via a fifo.

If forwarding fifo (named pipe), you have to use stdin. For 'named pipe' support in general feel free to create a github issue or another first open another discuss thread including filebeat config and some description of your setup.

Thanks.


(Cybacolt) #16

I'm really struggling with this one too. all suggestions do NOT work with salt 2015.8.x:

  • use_vt (either starting with service.running or cmd.run) fails
  • all combination of sudo / su -c fails from cmd.run
  • adding Default !requiretty to /etc/sudoers fails (this is default anyway?)
  • ssh -t -t also fails with most commands

the only thing i've got working is this:
ssh -t -t root@myhost "su -c 'service filebeat start'"

but the idea of looping ssh back to localhost with its own pub key to restart a service just feels dirty...

unfortunately is a breaker for me. if i cant get this working with saltstack soon, i'm going to have to look at other options for posting to logstash.

I would really appreciate any ideas on how to fix this!


(ruflin) #17

Can someone try the "raw" binary without the god-deamon command inside to see if that has an affect? Best would be to build it directly from source with the make command for your platform.


(Cybacolt) #18

as a followup, i've got this working with a ssh -t -t loopback. here is my working filebeat/service.sls:

filebeat.sshkeygen:
cmd.run:
- name: ssh-keygen -f /root/.ssh/filebeat -P ""
- unless:
- ls /root/.ssh/filebeat

filebeat.pubkeytoauth:
cmd.run:
- name: cat /root/.ssh/filebeat.pub >> /root/.ssh/authorized_keys
- unless: cat /root/.ssh/filebeat.pub | grep -f - /root/.ssh/authorized_keys
- require:
- cmd: filebeat.sshkeygen

filebeat.service:
cmd.run:
- name: ssh -t -t -i /root/.ssh/filebeat root@localhost "su -c 'service filebeat start'"
- unless:
- service filebeat status
- require:
- pkg: filebeat
- cmd: filebeat.sshkeygen
- cmd: filebeat.pubkeytoauth


(Ejether) #19

I am also affected by this problem. None of the listed workarounds were working for me so I set filebeat to run under supervisor instead. Seems to be working just fine.

Hope that helps.

[program:filebeat]
command=/usr/bin/filebeat -c /etc/filebeat/filebeat.yml

(Johanek) #20

I have the same problem, with salt 2015.5 on CentOS 6.

My fix has been update the init script so that the daemon command runs redirecting stdout/stderr to /dev/null. As per: https://github.com/dneg/beats/commit/d4817cdfd5b451093db8bfbf91d3493b84de23d0

I could PR that, but I'm not sure it's the correct thing to do