Issue with logstash on aarch64

Hello,

I'm having an issue with logstash running on aarch64. At this point, I don't believe it to be config based, as I ran config test on the various files in conf.d AND these files run on other hosts (that are x86_64). I only note the arch in the interest of full disclosure.

I've found that three specific input files, when placed in /etc/logstash/conf.d result in stack traces in /var/log/logstash/logstash-plain.log. The output looks something like:

[2018-05-23T09:40:11,866][ERROR][logstash.pipeline        ] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::File path=>["/var/log/logstash/logstash-plain.log"], type=>"logstashlogs", tags=>["systems"], codec=><LogStash::Codecs::Multiline auto_flush_interval=>1, pattern=>"^\\[%{TIMESTAMP_ISO8601}", negate=>true, what=>"previous", id=>"c86c336a-60a4-4185-aa9c-67e83a98dc13", enable_metric=>true, charset=>"UTF-8", multiline_tag=>"multiline", max_lines=>500, max_bytes=>10485760>, sincedb_path=>"/var/db/logstash-sincedb/logstashlogs", id=>"58cf273996c5add287b29935f91c4a1bc3ce9b19121eb55d54cf29bb75b4aaef", enable_metric=>true, stat_interval=>1, discover_interval=>15, sincedb_write_interval=>15, start_position=>"end", delimiter=>"\n", close_older=>3600>
  Error: Operation not permitted - No message available
  Exception: Errno::EPERM
  Stack: org/jruby/RubyFile.java:639:in `chown'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/helper.rb:45:in `atomic_write'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/tail_base.rb:197:in `_sincedb_write'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/observing_tail.rb:103:in `observe_read_file'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/observing_tail.rb:43:in `block in subscribe'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:255:in `block in each'
org/jruby/RubyArray.java:1734:in `each'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:217:in `block in each'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:357:in `block in synchronized'
org/jruby/ext/thread/Mutex.java:148:in `synchronize'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:357:in `synchronized'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:118:in `each'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:280:in `subscribe'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/observing_tail.rb:23:in `subscribe'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.0.3/lib/logstash/inputs/file.rb:305:in `run'
/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:524:in `inputworker'
/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:517:in `block in start_input'

I've looked at permissions on the various logstash config files and they're good. I've also looked at the files that logstash appears to be monitoring and they're also OK. They're not FULLY owned by the logstash user, but they are owned by the logstash group and have the read bit set; so logstash shouldn't have any issues (to my knowledge) extracting information from them. The reason I'm talking about permissions is because of the following snippet in the above block:

  Error: Operation not permitted - No message available
  Exception: Errno::EPERM
  Stack: org/jruby/RubyFile.java:639:in `chown'

I am not clear what is attempting to be done here. However, when these three files input files (or any one of them) are placed in conf.d we see continuous stack traces and logstash restarts itself.

Has anyone seen this before? I'm new to supporting logstash, so I apologize if I've left something out. Various search engine searches have led me nowhere thus far.

Thanks!

It is chowning the sincedb. So you should be looking at the ownership of the files and directories under /var/db/logstash-sincedb.

Hi,

Thanks! I'm not entirely clear what's being done here, either. Basically, I am starting logstash with all of our config files in /etc/logstash/conf.d. The odd thing is that on the system in question: the /var/db/logstash-sincedb file belongs to the logstash group, but the file owner is root (i.e. root:logstash). However, on other systems that are working, the permissions are the same and yet there are no stack traces in the log.

Are there specific permissions that need to be set? I've tinkered with what I think they should be in order to work, to no avail.

David

Looking at an old version of the code, it is doing a stat on the file, creating a new file, renaming the new file to have the name of the old file (thus replacing it) and trying to chown the new file to have the same ownership as the old file. The file, not the directory. If the file is owned by root the chown will fail.

Thanks for clarifying. While awaiting your response I did look at the files within that dir and they ARE owned by logstash:

[root@cav01 logstash-sincedb]# ls -al
total 16
drwxr-x--- 2 logstash logstash 68 May 23 09:40 .
drwxr-xr-x 4 root     root     58 May 22 13:44 ..
-rw-r--r-- 1 logstash logstash  8 May 23 09:27 auditd
-rw-r--r-- 1 logstash logstash 26 May 23 09:40 logstashlogs
-rw-r--r-- 1 logstash logstash  8 May 23 09:25 mcelog
-rw-r--r-- 1 logstash logstash  8 May 23 09:25 momlog

And the top level directory is owned by logstash, too:

[root@cav01 logstash-sincedb]# cd ..
[root@cav01 db]# ls -al
total 12
drwxr-xr-x   4 root     root       58 May 22 13:44 .
drwxr-xr-x. 21 root     root     4096 Mar 15  2017 ..
drwxr-x---   2 logstash logstash   68 May 23 09:40 logstash-sincedb
-rw-r--r--   1 root     root     5345 Nov 30 11:14 Makefile
drwx------   3 root     root       22 May 22 13:15 sudo

I am looking at other systems in the cluster to see if this isn't true for them as well (we're seeing this issue on an entire group of machines). On one system I do see what appears to be copies of files being renamed (at least, appended with some sort of information):

[root@cav02 logstash-sincedb]# ls -latr
total 16
drwxr-xr-x 4 root     root      58 May  7 16:22 ..
-rw-r--r-- 1 logstash logstash   0 May 18 17:34 logstashlogs.2048.86975.106821
-rw-r--r-- 1 logstash logstash   0 May 19 06:42 logstashlogs.2048.42637.988515
-rw-r--r-- 1 logstash logstash  26 May 23 05:51 slurmd.2052.76322.277653
-rw-r--r-- 1 logstash logstash  26 May 23 11:38 slurmd
-rw-r--r-- 1 logstash logstash  63 May 23 11:39 syslog
-rw-r--r-- 1 logstash logstash 239 May 23 11:39 logstashlogs
drwxr-x--- 2 logstash logstash 162 May 23 11:39 .

However, on that second machine we're also seeing, according to /var/log/logstash/logstash-plain.log that logstash is restarting itself about every 44 seconds and is noting similar issues as initially posted (i.e. the chown stack trace).

Everything looks like it should work. But that . at the end of the listing for the directory /var means you have additional security permissions configured. I have not used security contexts, so I do not know if they are significant here.

Thanks! I'll give that a look. To my knowledge, we're not using anything like SElinux, but I will double check. Glad I'm not the only one who's seeing this as a bit bizarre!

Yeah, we aren't using selinux. What's also interesting is that other files get created without incident in /var/db/logstash-sincedb for things like auditd or mcelog.

The only thing I can think of is group membership. Is logstash a member of the logstash group? Could you have a local passwd entry overriding LDAP or something like that? Also, does 'ps -p <pid> -o pid,euid,egid,fuid' show what you expect it to show? The directory is not a remote mount, right?

Our logstash user is a system-level account with nologin

[root@cav01 ~]# id logstash
uid=296(logstash) gid=985(logstash) groups=985(logstash),0(root)
[root@cav01 ~]# grep logstash /etc/passwd
logstash:x:296:985:logstash:/usr/share/logstash:/sbin/nologin

And:

[root@cav01 ~]# ps -p 69796 -o pid,euid,egid,fuid
  PID  EUID  EGID  FUID
69796   296   985   296

And you're correct, this is NOT a remote mount of any sort.

Thanks for trudging through this with me! I've been staring at it for days trying various combinations of things to no avail.

Maybe some more logs/context can help: https://pastebin.com/MCHTmyzD

Basically, logstash will spit out the stack traces in the paste and restart indefinitely until I stop it. I've explored which avenues seemed obvious (plugins, permissions, etc) to no avail.

I've even tried changing the sincedb_path in my input file to something more friendly to no avail. The same stack traces persist.

I am wondering if it is a race condition. You have 96 pipeline worker threads. Maybe two of them are going after the sincedb at the same time and that is causing the problem. Can you try '--pipeline.workers 1' and see if that eliminates the problem, then increase it until the problem reappears, then reduce it by, umm, a bit :wink:

Although, come to think of it, I am not sure sincedb handling would be done in a pipeline worker.

I'll still give it a shot. FWIW, I also tried going to the latest version of logstash on this one particular node and the problem still persists. At this point I can't tell if it's something system specific (as these are ARM chips). I wonder if it's not a configuration like you mention.

It's funny you mention race-conditions because I've observed something that seems off to me. So, if I start with just one trouble-making configuration file in conf.d logstash seems to start without issue and has no problems; BUT the sincedb file isn't created in this first launch. When I stop logstash, the sincedb file appears in the correct location. And then when I restart logstash the stack traces persist.

I don't know if this is standard/expected behavior. I'm conceiving of the sincedb file as behaving more like a PID file (i.e. it gets created upon start) whereas it could be a check-point file of sorts that saves place. I'm sure I've just misconceived it, but I wanted to note the observation just in case.

Just an update: the problem persists with the pipeline.workers cranked way down, too. I did find something similar to what I was seeing in an old github thread:

And I tried exporting the SINCEDB_DIR var as a part of the options in startup.options:

read -r -d '' PRESTART << EOM
export SINCEDB_DIR="/var/db/logstash-sincedb"
EOM

Also to no avail.

In the interest of being a good citizen of the net (and thus avoiding this: https://xkcd.com/979/) I'd like to give a status update. I don't think we're completely out of the woods (yet) BUT I wanted to share some lessons learned so far:

  • Building from source on my native platform seems to help
    • This required multiple iterations and building various versions so as to not introduce something TOO new into our infrastructure, but also something that would work
    • All of the errors we were seeing in the logstash logs did not add up, as we made sure permissions were as widely open as possible and still had issues.
    • So, if you're experiencing issues you can't explain, and others are stumped too, and you're on an usual system arch (like ARM) building from source is probably your best bet (despite the fact that the package is a .noarch). I read somewhere that logstash and the like are NOT tested on ARM but "should" work
  • If you're behind any sort of proxy trying to build logstash from source, make sure you update the gradle.properties file within the dir you cloned with git with your proxy information. Gradle does NOT leverage system environment variables. Example gradle.properties syntax below (for HTTP and HTTPS proxies). If you don't explicitly declare your proxy info then rake bootstrap will choke out trying to download gradle or some other necessary piece of the build process
systemProp.http.proxyHost=<YourHTTPProxyAddressHere>
systemProp.https.proxyHost=<YourHTTPSProxyAddressHere>
systemProp.http.proxyPort=<YourHTTPProxyPortHere>
systemProp.https.proxyPort=<YourHTTPSProxyPortHere>

Even after building from source on my platform permissions issues persist, sadly.

[2018-06-05T08:24:49,054][WARN ][logstash.pipeline        ] CAUTION: Recommended inflight events max exceeded! Logstash will run with up to 12000 events in memory in y[92/1860]
nt configuration. If your message sizes are large this may cause instability with the default heap size. Please consider setting a non-standard heap size, changing the batch si
ze (currently 125), or changing the number of pipeline workers (currently 96) {:pipeline_id=>"main", :thread=>"#<Thread:0x11a0c4bc run>"}
[2018-06-05T08:24:51,308][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x11a0c4bc run>"}
[2018-06-05T08:24:52,402][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}

[2018-06-05T08:24:54,769][ERROR][logstash.inputs.file     ] each: watched?: /var/log/messages: (#<Errno::EPERM: Operation not permitted - No message available>)

[2018-06-05T08:24:55,298][ERROR][logstash.inputs.file     ] each: watched?: /var/log/slurm/slurmd.log: (#<Errno::EPERM: Operation not permitted - No message available>)



[2018-06-05T08:24:56,177][ERROR][logstash.inputs.file     ] each: watched?: /var/log/logstash/logstash-plain.log: (#<Errno::EPERM: Operation not permitted - No message availabl
e>)
[2018-06-05T08:24:56,944][ERROR][logstash.pipeline        ] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::File path=>["/var/log/messages"], type=>"syslog", sincedb_path=>"/var/db/logstash-sincedb/syslog", id=>"4e864cab1a32e9e98e5a1764e122d6eb5d7e13cb043
d7cd2a106670879c425a8", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_51d5a2ec-33cc-4d52-98fa-e609638847ff", enable_metric=>true, charset=>"UTF-8">, stat_inte
rval=>1, discover_interval=>15, sincedb_write_interval=>15, start_position=>"end", delimiter=>"\n", close_older=>3600>
  Error: Operation not permitted - No message available
  Exception: Errno::EPERM
  Stack: org/jruby/RubyFile.java:639:in `chown'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/helper.rb:45:in `atomic_write'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/tail_base.rb:197:in `_sincedb_write'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/observing_tail.rb:103:in `observe_read_file'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/observing_tail.rb:43:in `block in subscribe'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:255:in `block in each'
org/jruby/RubyArray.java:1734:in `each'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:217:in `block in each'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:357:in `block in synchronized'
org/jruby/ext/thread/Mutex.java:148:in `synchronize'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:357:in `synchronized'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:118:in `each'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/watch.rb:280:in `subscribe'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/filewatch-0.9.0/lib/filewatch/observing_tail.rb:23:in `subscribe'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.0.5/lib/logstash/inputs/file.rb:305:in `run'
/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:514:in `inputworker'
/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:507:in `block in start_input'

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.