EntitlementBootstrap failure preventing startup: AttachNotSupportedException: Unable to open socket file

[this comment has no technical discussion in it]

Sorry huge miscommunication on my side when I said this:

for now, we can workaround the problem and there doesn’t seem to be any action on the Elasticsearch side.

Meant to say “doesn’t seem to be any action needed on the Elasticsearch side”. This was really poorly worded on my part and what I meant is that since we haven’t determined there is a root cause [yet] or actionable thing for Elasticsearch to do, I can’t yet file a bug for you all to even take action on.

Instead, what I originally wrote makes it sound like you all aren’t doing anything to help. I think this forum is incredibly helpful and very fast to respond. There aren’t a lot of products where devs (I assume, from this post and others) actually respond to topics seemingly every day. I realize this is not a support channel, though sometimes I treat it as such and I’m very appreciative of all the help I’ve received!

4 Likes

FWIW that was how I read your message, but also thank you for the kind words :elasticheart:

Knowledge dumping here…Lucky for you all this will likely be my last post on this for a while.

The strace of the ls -al command on the dir on which the volume is mounted shows something interesting. The lgetxattr and getxattr syscalls on it show -1 EOPNOTSUPP (Operation not supported) on the volume that causes an issue, and -1 ENODATA (No data available) on the volume that has no issue. Keep in mind that the ls -al command appears to work just fine in both environments, meaning it shows the right output. See [1] for more details. My favorite LLM tells me that this EOPNOTSUPP error just indicates that the underlying filesystem doesn't implement these extended attributes - NOT that there's a problem and this by itself shouldn't be a cause for the JVM to not be able to attach.

The strace output between an elasticsearch server startup that fails vs. one that doesn't did NOT produce anything really all that useful, unless I'm just using it incorrectly (I just did strace <java command I got from 'ps' output>). The syscalls were identical all the way through to where it printed out "Bootstrapping Entitlements" after which the program execution diverged as expected. See [2] for small snippet of results.

I also played around with the EFS CSI driver and storage class that I've been using. ReadWriteMany is not the problem here as it also fails with ReadWriteOnce. Setting directoryPerms=777 in the storage class and adding supplementalGroups on the pod that correspond to the CSI driver's provided gid also doesn't work. In short, it kinda looks like almost all EFS usage is borked unless you explicitly set the uid/gid in the storageclass.

[1] (some paths altered to remove sensitive info)

strace of the ls -al command on the dir on which the problematic efs-created volume is mounted

statx(AT_FDCWD, "/usr/local/mydir/myapp/logs", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_UID|STATX_GID|STATX_MTIME|STATX_SIZE, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|0700, stx_size=6144, ...}) = 0
lgetxattr("/usr/local/mydir/myapp/logs", "security.selinux", 0x5604c91e4340, 255) = -1 EOPNOTSUPP (Operation not supported)
getxattr("/usr/local/mydir/myapp/logs", "system.posix_acl_access", NULL, 0) = -1 EOPNOTSUPP (Operation not supported)

strace of the ls -al command on the dir on which the problem-free ebs-created volume is mounted

statx(AT_FDCWD, "/usr/local/mydir/myapp/logs", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_UID|STATX_GID|STATX_MTIME|STATX_SIZE, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISGID|0775, stx_size=4096, ...}) = 0
lgetxattr("/usr/local/mydir/myapp/logs", "security.selinux", 0x56016ab0f340, 255) = -1 ENODATA (No data available)
getxattr("/usr/local/mydir/myapp/logs", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available)
getxattr("/usr/local/mydir/myapp/logs", "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)

[2]: strace output of elasticsearch process (some paths altered to remove sensitive info)

access("/usr/local/mydir/myapp/es-server/elasticsearch/lib/cli-launcher/", F_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/mydir/myapp/es-server/elasticsearch/lib/cli-launcher/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|S_ISGID|0755, st_size=6, ...}) = 0
getdents64(3, 0x55d4e842f540 /
3 entries /, 32768) = 96
getdents64(3, 0x55d4e842f540 /
0 entries /, 32768) = 0
close(3) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7fa10b334b50, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIGINFO, sa_restorer=0x7fa10b2ebc30}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa109d77000
rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fa109e76910, parent_tid=0x7fa109e76910, exit_signal=0, stack=0x7fa109d77000, stack_size=0xfef00, tls=0x7fa109e76640}, 88) = -1 ENOSYS (Function not implemented)
clone(child_stack=0x7fa109e75ef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[34], tls=0x7fa109e76640, child_tidptr=0x7fa109e76910) = 34
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7fa109e76910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 34, NULL, FUTEX_BITSET_MATCH_ANY[2025-10-23T18:53:31,246][INFO ][org.elasticsearch.bootstrap.Elasticsearch] [myapp-k8s-jbuhc-es-server-0.myapp-k8s-jbuhc-es-server-headless] version[8.19.4], pid[94], build[zip/aa0a7826e719b392e7782716b323c4fb8fa3b392/2025-09-16T22:06:03.940754111Z], OS[Linux/5.15.193-134.215.amzn2.x86_64/amd64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/17.0.16/17.0.16+8]
[2025-10-23T18:53:31,248][INFO ][org.elasticsearch.bootstrap.Elasticsearch] [myapp-k8s-jbuhc-es-server-0.myapp-k8s-jbuhc-es-server-headless] JVM home [/usr/local/mydir/myapp/java], using bundled JDK [false]
[2025-10-23T18:53:31,248][INFO ][org.elasticsearch.bootstrap.Elasticsearch] [myapp-k8s-jbuhc-es-server-0.myapp-k8s-jbuhc-es-server-headless] JVM arguments [-Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j2.formatMsgNoLookups=true, -Djava.locale.providers=SPI,COMPAT, -Dorg.apache.lucene.vectorization.upperJavaFeatureVersion=17, -Des.path.home=/usr/local/mydir/myapp/es-server/elasticsearch, -Des.distribution.type=zip, -Des.java.type=ES_JAVA_HOME, -XX:ReplayDataFile=logs/replay_pid%p.log, -Des.entitlements.enabled=true, -XX:+EnableDynamicAgentLoading, -Djdk.attach.allowAttachSelf=true, --patch-module=java.base=/usr/local/mydir/myapp/es-server/elasticsearch/lib/entitlement-bridge/elasticsearch-entitlement-bridge-8.19.4.jar, --add-exports=java.base/org.elasticsearch.entitlement.bridge=org.elasticsearch.entitlement,java.logging,java.net.http,java.naming,jdk.net, -XX:+UseG1GC, -Djava.io.tmpdir=/tmp/elasticsearch-4448555691991616784, -XX:+HeapDumpOnOutOfMemoryError, -XX:+ExitOnOutOfMemoryError, -XX:ErrorFile=hs_err_pid%p.log, -Xlog:gc
,gc+age=trace,safepoint:file=gc.log:utctime,level,pid,tags:filecount=32,filesize=64m, -Xms2048m, -Xmx2048m, -XX:MaxDirectMemorySize=1073741824, -XX:G1HeapRegionSize=4m, -XX:InitiatingHeapOccupancyPercent=30, -XX:G1ReservePercent=15, --module-path=/usr/local/mydir/myapp/es-server/elasticsearch/lib, --add-modules=jdk.net, --add-modules=jdk.management.agent, --add-modules=ALL-MODULE-PATH, -Djdk.module.main=org.elasticsearch.server]
[2025-10-23T18:53:31,248][INFO ][org.elasticsearch.bootstrap.Elasticsearch] [myapp-k8s-jbuhc-es-server-0.myapp-k8s-jbuhc-es-server-headless] Default Locale [en]
[2025-10-23T18:53:31,610][INFO ][org.elasticsearch.nativeaccess.NativeAccess] [myapp-k8s-jbuhc-es-server-0.myapp-k8s-jbuhc-es-server-headless] Using [jna] native provider and native methods for [Linux]
[2025-10-23T18:53:31,733][INFO ][org.elasticsearch.bootstrap.Elasticsearch] [myapp-k8s-jbuhc-es-server-0.myapp-k8s-jbuhc-es-server-headless] Bootstrapping Entitlements

did you run strace with -f? Or even better with -ff / --follow-forks --output-separately ?

a little script:

# cat /tmp/bla.sh
uid=$(id -u elasticsearch)
for pid in $(pgrep -u $uid); do
  strace -tt -ff -p $pid -o /tmp/strace-$pid &
done

start this shell script ASAP after starting elasticsearch normally, assuming its running as user=elasticsearch, and it will trace all the child processes.

e.g. for my (standard) 9.1.5 setup I see:

# fgrep -h /root/tmp/.java_pid /tmp/strace-* | sort
22:42:01.950851 stat("/proc/9923/root/tmp/.java_pid9923", 0x7efc72750330) = -1 ENOENT (No such file or directory)
22:42:02.055341 stat("/proc/9923/root/tmp/.java_pid9923", {st_mode=S_IFSOCK|0600, st_size=0, ...}) = 0
22:42:02.055441 stat("/proc/9923/root/tmp/.java_pid9923", {st_mode=S_IFSOCK|0600, st_size=0, ...}) = 0
22:42:02.055882 stat("/proc/9923/root/tmp/.java_pid9923", {st_mode=S_IFSOCK|0600, st_size=0, ...}) = 0
22:42:02.056219 connect(57, {sa_family=AF_UNIX, sun_path="/proc/9923/root/tmp/.java_pid9923"}, 110) = 0
22:42:02.059988 connect(57, {sa_family=AF_UNIX, sun_path="/proc/9923/root/tmp/.java_pid9923"}, 110) = 0

and for the log directory:

# fgrep -h /var/log/elasticsearch /tmp/strace-* | sort | head -10
22:41:44.683063 mkdir("/var/log/elasticsearch", 0777) = -1 EEXIST (File exists)
22:41:44.683596 statx(AT_FDCWD, "/var/log/elasticsearch", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|S_ISGID|0750, stx_size=4096, ...}) = 0
22:41:44.692943 chdir("/var/log/elasticsearch") = 0
22:41:45.473736 getcwd("/var/log/elasticsearch", 4096) = 23
22:41:45.683130 getcwd("/var/log/elasticsearch", 4097) = 23
22:41:49.386476 readlink("/var/log/elasticsearch", 0x7efc7274d800, 1023) = -1 EINVAL (Invalid argument)
22:41:49.386587 readlink("/var/log/elasticsearch/elasticsearch_server.json", 0x7efc7274d800, 1023) = -1 EINVAL (Invalid argument)
22:41:49.386755 stat("/var/log/elasticsearch", {st_mode=S_IFDIR|S_ISGID|0750, st_size=4096, ...}) = 0
22:41:49.387075 mkdir("/var/log/elasticsearch", 0777) = -1 EEXIST (File exists)
22:41:49.387574 statx(AT_FDCWD, "/var/log/elasticsearch", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|S_ISGID|0750, stx_size=4096, ...}) = 0

I notice it actually does a chdir into the log directory, then a getcwd, which obviously works for me here. But in your case?

btw, note the calls to attach to the /proc/9923/root/tmp/.java_pid9923 came well after the syscalls to do with the log directory, which is the order we expected.

I looked at syscall traces from 8.18.0 startup vs 8.19.5 startup, of course a zillion differences.

One thing I noticed is that very early on in boot process it tries to effectively do log rotation on gc.log.

8.18.0

11:36:09.152514 stat("/var/log/elasticsearch/gc.log", {st_mode=S_IFREG|0644, st_size=3048, ...}) = 0
11:36:09.152560 stat("/var/log/elasticsearch/gc.log", {st_mode=S_IFREG|0644, st_size=3048, ...}) = 0
11:36:09.152604 stat("/var/log/elasticsearch/gc.log.00", {st_mode=S_IFREG|0644, st_size=3056, ...}) = 0
...

8.19.5

11:41:11.775613 stat("gc.log", {st_mode=S_IFREG|0644, st_size=40452, ...}) = 0
11:41:11.775655 stat("gc.log", {st_mode=S_IFREG|0644, st_size=40452, ...}) = 0
11:41:11.775701 stat("gc.log.00", {st_mode=S_IFREG|0644, st_size=3056, ...}) = 0
11:41:11.775740 stat("gc.log.00", {st_mode=S_IFREG|0644, st_size=3056, ...}) = 0
11:41:11.775776 stat("gc.log.00", {st_mode=S_IFREG|0644, st_size=3056, ...}) = 0
...

Note 8.19.5 doesn’t use the full paths, it’s already done a chdir to /var/log/elasticsearch by this point, which is different from 8.18.0

Suggestion for @buitcj Try pointing the location of the gc.log files to somewhere outside path.logs, e.g. /tmp, in jvm.options and see if it changes anything? Also, can you check if the user running elasticsearch can do a chdir into your equiv of /var/log/elasticsearch, as in 8.18.0 it didn’t seem to need to do so but in 8.19.5 it seems it does.

Looking at calls involving the .java_pid / .attach_pid:

8.18.0

11:36:24.669061 stat("/proc/3519336/root/tmp/.java_pid3519336", 0x73e7e27fe340) = -1 ENOENT (No such file or directory)
11:36:24.669145 openat(AT_FDCWD, "/proc/3519336/cwd/.attach_pid3519336", O_RDWR|O_CREAT|O_EXCL, 0666) = -1 EACCES (Permission denied)
11:36:24.669365 openat(AT_FDCWD, "/proc/3519336/root/tmp/.attach_pid3519336", O_RDWR|O_CREAT|O_EXCL, 0666) = 53
11:36:24.669725 readlink("/tmp/.attach_pid3519336", 0x73e7e27fbea0, 1023) = -1 EINVAL (Invalid argument)
11:36:24.670807 stat(".attach_pid3519336", 0x73e7b4cfda20) = -1 ENOENT (No such file or directory)
11:36:24.670913 stat("/tmp/.attach_pid3519336", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
11:36:24.672500 unlink("/tmp/.java_pid3519336.tmp") = -1 ENOENT (No such file or directory)
11:36:24.672575 bind(53, {sa_family=AF_UNIX, sun_path="/tmp/.java_pid3519336.tmp"}, 110) = 0
11:36:24.672703 chmod("/tmp/.java_pid3519336.tmp", 0600) = 0
11:36:24.672839 chown("/tmp/.java_pid3519336.tmp", 119, 125) = 0
11:36:24.672888 rename("/tmp/.java_pid3519336.tmp", "/tmp/.java_pid3519336") = 0

8.19.5

11:41:26.318501 stat("/proc/3544717/root/tmp/.java_pid3544717", 0x76c6e07fe320) = -1 ENOENT (No such file or directory)
11:41:26.318594 openat(AT_FDCWD, "/proc/3544717/cwd/.attach_pid3544717", O_RDWR|O_CREAT|O_EXCL, 0666) = 53

----> HERE
11:41:26.319543 readlink("/var/log/elasticsearch/.attach_pid3544717", 0x76c6e07fbe90, 1023) = -1 EINVAL (Invalid argument)
----> HERE

11:41:26.321214 stat(".attach_pid3544717", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
11:41:26.323904 unlink("/tmp/.java_pid3544717.tmp") = -1 ENOENT (No such file or directory)
11:41:26.323972 bind(53, {sa_family=AF_UNIX, sun_path="/tmp/.java_pid3544717.tmp"}, 110) = 0
11:41:26.324089 chmod("/tmp/.java_pid3544717.tmp", 0600) = 0
11:41:26.324210 chown("/tmp/.java_pid3544717.tmp", 119, 125) = 0
11:41:26.324260 rename("/tmp/.java_pid3544717.tmp", "/tmp/.java_pid3544717") = 0 

Note the readlink I highlighted, thats not in the 8.18.0 trace.

In 8.19.5 I see these chdir calls on startup:

11:41:09.273311 chdir("/tmp/final-flags5522972383749294867") = 0
11:41:10.340103 chdir("/tmp/final-flags17689081138414072901") = 0
11:41:11.762493 chdir("/var/log/elasticsearch") = 0
11:41:53.267101 chdir("/usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin") = 0

in 8.19.0 I just saw this:

11:36:52.759074 chdir("/usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin") = 0

Was this the working or the broken config? Can you share the other one for comparison?

Also yeah as @RainTown suggests, you need to pass the -f option to follow calls that aren’t on the main thread. You probably want to add -ttt as well to get timestamps. And -o to send output to a file rather than just sending it to stdout.

Summarizing a bit what I wrote in previous messages, it seems 8.19.5 does a chdir into the path.logs directory, 8.18.0 did not do this. There are relatively few *chdir calls, mostly into /tmp/hsperfdata_elasticsearch

It also tries to create the .attach.pid$PID file there, via /proc/$PID/cwd path, which is the same directory as path.logs at this point. This succeeds in my (standard install) case. The openat options are O_RDWR|O_CREAT|O_EXCL, and perms 0666. The file in this directory is closed and unlinked on my system approx 100 ms later. In between open and close there is a readlink call which shows the entire path (and returns -1 as its a file, not a link). Aside from log files (sic), I dont see other filesystem activity in that directory

I’m certainly not saying this is a smoking gun, but given the “unusual” setup in @buitcj ‘s path.logs directory maybe it’s interesting. It also sort of “links” the log directory and the “attach to JVM” topics, slightly.

I am also not sure it’s really an appropriate use of the path.logs directory ?

Another dump of info (using for pid in $(pgrep java); do strace -tt -ff -p $pid -o /tmp/strace-$pid &done)… Summary is I didn’t find a lot of useful information (to me), but maybe something will stick out to someone else.

Searching for "attach" in the strace output, my take away is it seem like both environments actually fail to readlink the attach_pid file (they both return -1) which is strange but if that's how it behaves in a working system then maybe this is not a problem. See [1]

Searching for the container's directory on which the volume is mounted (In this case /usr/local/mydir/myapp/logs, which is a dummy path I substituted to hide some pii), I didn't find anything particularly useful. I see that a sub-dir has S_ISGID set on the working environment that the problematic environment doesn’t have, but that doesn't really pertain to execution privileges, it pertains to what permissions a newly created file/dir has. See [2]

Searching for "java_pid" in the strace output, I didn't find anything interesting. It looks like the failing site never finds the /proc/303/root/tmp/.java_pid303 file, whereas on the working site a /proc/290/root/tmp/.java_pid290 file is found and eventually a connect call is invoked against it and then the file is deleted. See [3]

[1] “attach” search results

Failing system (looks like part of the way down, those writes are the stack trace that's printed out)

	18:03:05.778196 openat(AT_FDCWD, "/proc/303/cwd/.attach_pid303", O_RDWR|O_CREAT|O_EXCL, 0666) = 44                                
	18:03:05.784629 readlink("/usr/local/mydir/myapp/logs/myapp/.attach_pid303", 0x7f6eaabc1a30, 1023) = -1 EINVAL (Invalid argu ment)                                                                                                                             
	18:03:05.790869 write(1, "\tat sun.tools.attach.VirtualMach"..., 94) = 94                                                         
	18:03:05.790969 write(1, "\tat sun.tools.attach.AttachProvi"..., 108) = 108                                                       
	18:03:05.791072 write(1, "\tat com.sun.tools.attach.Virtual"..., 91) = 91                                                         
	18:03:05.785024 stat(".attach_pid303", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0      

Working system

	17:47:48.890508 openat(AT_FDCWD, "/proc/290/cwd/.attach_pid290", O_RDWR|O_CREAT|O_EXCL, 0666) = 44                               
	17:47:48.891237 readlink("/usr/local/mydir/myapp/logs/myapp/.attach_pid290", 0x7f3a94feba30, 1023) = -1 EINVAL (Invalid argu ment)                                                                                                                             
	17:47:48.991963 unlink("/usr/local/mydir/myapp/logs/myapp/.attach_pid290") = 0                                           

[2] “/usr/local/mydir/myapp/logs” search results, which is the container path of the mount

Failing system

	18:02:55.658113 mkdir("/usr/local/mydir/myapplogs/myapp", 0777) = -1 EEXIST (File exists)                                  
		18:02:55.660160 statx(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp", AT_STATX_SYNC_AS_STAT, STATX_TYPE|STATX_MODE, {stx_mask =STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0777, stx_size=6144, ...}) = 0                                
		18:02:55.676975 chdir("/usr/local/mydir/myapplogs/myapp") = 0                                                              
		18:02:55.684193 getcwd("/usr/local/mydir/myapplogs/myapp", 4096) = 40                                                      
		18:02:56.471725 getcwd("/usr/local/mydir/myapplogs/myapp", 4096) = 40                                                      
		18:02:56.525819 getcwd("/usr/local/mydir/myapplogs/myapp", 4097) = 40                                                      
		18:02:58.690712 readlink("/usr/local/mydir/myapplogs", 0x7f6eaabc1380, 1023) = -1 EINVAL (Invalid argument)                        
		18:02:58.690767 readlink("/usr/local/mydir/myapplogs/myapp", 0x7f6eaabc1380, 1023) = -1 EINVAL (Invalid argument)          
		18:02:58.690827 readlink("/usr/local/mydir/myapplogs/myapp/myapp.log", 0x7f6eaabc1380, 1023) = -1 EINVAL (Invalid a rgument)                                                                                                                          
		18:02:58.692573 stat("/usr/local/mydir/myapplogs/myapp", {st_mode=S_IFDIR|0777, st_size=6144, ...}) = 0                    
		18:02:58.692858 mkdir("/usr/local/mydir/myapplogs/myapp", 0777) = -1 EEXIST (File exists)                                  
		18:02:58.694318 statx(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp", AT_STATX_SYNC_AS_STAT, STATX_TYPE|STATX_MODE, {stx_mask =STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0777, stx_size=6144, ...}) = 0                                
		18:02:58.694389 openat(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp/myapp.log", O_RDWR|O_CREAT|O_EXCL, 0666) = -1 EE XIST (File exists)                                                                                                                
		18:02:58.695094 stat("/usr/local/mydir/myapplogs/myapp/myapp.log", {st_mode=S_IFREG|0777, st_size=1110110, ...}) = 0                                                                                                                                 
		18:02:58.695165 openat(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp/myapp.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 4 6                                                                                                                                 
		18:02:58.695925 stat("/usr/local/mydir/myapplogs/myapp/myapp.log", {st_mode=S_IFREG|0777, st_size=1110110, ...}) = 0                                                                                                                                 
		18:02:58.696009 access("/usr/local/mydir/myapplogs/myapp/myapp.log", F_OK) = 0                                     
		18:02:58.696077 statx(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp/myapp.log", AT_STATX_SYNC_AS_STAT, STATX_ALL, {st x_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0777, stx_size=1110110, ...}) = 0                       
		18:02:58.696722 stat("/usr/local/mydir/myapplogs/myapp/myapp.log", {st_mode=S_IFREG|0777, st_size=1110110, ...}) = 0                                                                                                                                 
		18:02:58.696774 stat("/usr/local/mydir/myapplogs/myapp/myapp.log", {st_mode=S_IFREG|0777, st_size=1110110, ...}) = 0                                                                                                                                 
		18:02:58.757571 readlink("/usr/local/mydir/myapplogs", 0x7f6eaabc1380, 1023) = -1 EINVAL (Invalid argument)
		18:02:58.757623 readlink("/usr/local/mydir/myapplogs/myapp", 0x7f6eaabc1380, 1023) = -1 EINVAL (Invalid argument)
		18:02:58.757675 readlink("/usr/local/mydir/myapplogs/myapp/myapp_index_search_slowlog.log", 0x7f6eaabc1380, 1023) = -1 EINVAL (Invalid argument)
		18:02:58.759336 stat("/usr/local/mydir/myapplogs/myapp", {st_mode=S_IFDIR|0777, st_size=6144, ...}) = 0
		18:02:58.759426 mkdir("/usr/local/mydir/myapplogs/myapp", 0777) = -1 EEXIST (File exists)
		18:02:58.760202 statx(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp", AT_STATX_SYNC_AS_STAT, STATX_TYPE|STATX_MODE, {stx_mask =STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0777, stx_size=6144, ...}) = 0
		18:02:58.760264 openat(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp/myapp_index_search_slowlog.log", O_RDWR|O_CREAT| O_EXCL, 0666) = -1 EEXIST (File exists)
		18:02:58.760928 stat("/usr/local/mydir/myapplogs/myapp/myapp_index_search_slowlog.log", {st_mode=S_IFREG|0777, st_s ize=0, ...}) = 0
		18:02:58.760997 openat(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp/myapp_index_search_slowlog.log", O_WRONLY|O_CREA T|O_APPEND, 0666) = 47
		18:02:58.761749 stat("/usr/local/mydir/myapplogs/myapp/myapp_index_search_slowlog.log", {st_mode=S_IFREG|0777, st_s ize=0, ...}) = 0
		18:02:58.761820 access("/usr/local/mydir/myapplogs/myapp/myapp_index_search_slowlog.log", F_OK) = 0
		18:02:58.761884 statx(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp/myapp_index_search_slowlog.log", AT_STATX_SYNC_AS _STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0777, stx_size=0, ...}) = 0
		18:02:58.761968 stat("/usr/local/mydir/myapplogs/myapp/myapp_index_search_slowlog.log", {st_mode=S_IFREG|0777, st_s ize=0, ...}) = 0
		18:02:58.762025 stat("/usr/local/mydir/myapplogs/myapp/myapp_index_search_slowlog.log", {st_mode=S_IFREG|0777, st_s ize=0, ...}) = 0
		18:02:58.764647 readlink("/usr/local/mydir/myapplogs", 0x7f6eaabc1380, 1023) = -1 EINVAL (Invalid argument)
		18:02:58.764681 readlink("/usr/local/mydir/myapplogs/myapp", 0x7f6eaabc1380, 1023) = -1 EINVAL (Invalid argument)
		18:02:58.764728 readlink("/usr/local/mydir/myapplogs/myapp/myapp_index_indexing_slowlog.log", 0x7f6eaabc1380, 1023) = -1 EINVAL (Invalid argument)
		18:02:58.766352 stat("/usr/local/mydir/myapplogs/myapp", {st_mode=S_IFDIR|0777, st_size=6144, ...}) = 0
		18:02:58.766425 mkdir("/usr/local/mydir/myapplogs/myapp", 0777) = -1 EEXIST (File exists)
		18:02:58.767221 statx(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp", AT_STATX_SYNC_AS_STAT, STATX_TYPE|STATX_MODE, {stx_mask =STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0777, stx_size=6144, ...}) = 0
		18:02:58.767294 openat(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp/myapp_index_indexing_slowlog.log", O_RDWR|O_CREA T|O_EXCL, 0666) = -1 EEXIST (File exists)
		18:02:58.768103 stat("/usr/local/mydir/myapplogs/myapp/myapp_index_indexing_slowlog.log", {st_mode=S_IFREG|0777, st _size=0, ...}) = 0
		18:02:58.768167 openat(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp/myapp_index_indexing_slowlog.log", O_WRONLY|O_CR EAT|O_APPEND, 0666) = 48
		18:02:58.768940 stat("/usr/local/mydir/myapplogs/myapp/myapp_index_indexing_slowlog.log", {st_mode=S_IFREG|0777, st _size=0, ...}) = 0
		18:02:58.769005 access("/usr/local/mydir/myapplogs/myapp/myapp_index_indexing_slowlog.log", F_OK) = 0
		18:02:58.769063 statx(AT_FDCWD, "/usr/local/mydir/myapplogs/myapp/myapp_index_indexing_slowlog.log", AT_STATX_SYNC_ AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0777, stx_size=0, ...}) = 0
		18:02:58.769136 stat("/usr/local/mydir/myapplogs/myapp/myapp_index_indexing_slowlog.log", {st_mode=S_IFREG|0777, st _size=0, ...}) = 0
		18:02:58.769188 stat("/usr/local/mydir/myapplogs/myapp/myapp_index_indexing_slowlog.log", {st_mode=S_IFREG|0777, st _size=0, ...}) = 0
		18:03:05.784206 readlink("/proc/303/cwd", "/usr/local/mydir/myapplogs/search"..., 1023) = 39
		18:03:05.784510 readlink("/usr/local/mydir/myapplogs", 0x7f6eaabc1a30, 1023) = -1 EINVAL (Invalid argument)
		18:03:05.784570 readlink("/usr/local/mydir/myapplogs/myapp", 0x7f6eaabc1a30, 1023) = -1 EINVAL (Invalid argument)
		18:03:05.784629 readlink("/usr/local/mydir/myapplogs/myapp/.attach_pid303", 0x7f6eaabc1a30, 1023) = -1 EINVAL (Invalid argu ment)
	
	
	Working system
	
		17:47:38.970577 mkdir("/usr/local/mydir/myapp/logs/myapp", 0777) = -1 EEXIST (File exists)                                  
		17:47:38.970693 statx(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp", AT_STATX_SYNC_AS_STAT, STATX_TYPE|STATX_MODE, {stx_mask =STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|S_ISGID|0775, stx_size=4096, ...}) = 0                        
		17:47:38.987399 chdir("/usr/local/mydir/myapp/logs/myapp") = 0                                                              
		17:47:38.995145 getcwd("/usr/local/mydir/myapp/logs/myapp", 4096) = 40                                                      
		17:47:39.807473 getcwd("/usr/local/mydir/myapp/logs/myapp", 4096) = 40                                                      
		17:47:39.868324 getcwd("/usr/local/mydir/myapp/logs/myapp", 4097) = 40                                                      
		17:47:42.158019 readlink("/usr/local/mydir/myapp/logs", 0x7f3a94feb380, 1023) = -1 EINVAL (Invalid argument)                        
		17:47:42.158079 readlink("/usr/local/mydir/myapp/logs/myapp", 0x7f3a94feb380, 1023) = -1 EINVAL (Invalid argument)          
		17:47:42.158145 readlink("/usr/local/mydir/myapp/logs/myapp/myapp.log", 0x7f3a94feb380, 1023) = -1 EINVAL (Invalid a rgument)                                                                                                                          
		17:47:42.158238 stat("/usr/local/mydir/myapp/logs/myapp", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0            
		17:47:42.158555 mkdir("/usr/local/mydir/myapp/logs/myapp", 0777) = -1 EEXIST (File exists)                                  
		17:47:42.158700 statx(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp", AT_STATX_SYNC_AS_STAT, STATX_TYPE|STATX_MODE, {stx_mask =STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|S_ISGID|0775, stx_size=4096, ...}) = 0                        
		17:47:42.158779 openat(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp/myapp.log", O_RDWR|O_CREAT|O_EXCL, 0666) = -1 EE XIST (File exists)                                                                                                                
		17:47:42.158919 stat("/usr/local/mydir/myapp/logs/myapp/myapp.log", {st_mode=S_IFREG|0664, st_size=709226, ...}) = 0
		17:47:42.158999 openat(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp/myapp.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 46                                                                                                                                 
		17:47:42.159115 stat("/usr/local/mydir/myapp/logs/myapp/myapp.log", {st_mode=S_IFREG|0664, st_size=709226, ...}) = 0
		17:47:42.159205 access("/usr/local/mydir/myapp/logs/myapp/myapp.log", F_OK) = 0                                     
		17:47:42.159288 statx(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp/myapp.log", AT_STATX_SYNC_AS_STAT, STATX_ALL, {st x_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=709226, ...}) = 0                                
		17:47:42.160091 stat("/usr/local/mydir/myapp/logs/myapp/myapp.log", {st_mode=S_IFREG|0664, st_size=709226, ...}) = 0
		17:47:42.160162 stat("/usr/local/mydir/myapp/logs/myapp/myapp.log", {st_mode=S_IFREG|0664, st_size=709226, ...}) = 0
		17:47:42.170320 readlink("/usr/local/mydir/myapp/logs", 0x7f3a94feb380, 1023) = -1 EINVAL (Invalid argument)                        
		17:47:42.170377 readlink("/usr/local/mydir/myapp/logs/myapp", 0x7f3a94feb380, 1023) = -1 EINVAL (Invalid argument)          
		17:47:42.170423 readlink("/usr/local/mydir/myapp/logs/myapp/myapp_index_search_slowlog.log", 0x7f3a94feb380, 1023) = -1 EINVAL (Invalid argument)                                                                                                     
		17:47:42.170483 stat("/usr/local/mydir/myapp/logs/myapp", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0            
		17:47:42.170557 mkdir("/usr/local/mydir/myapp/logs/myapp", 0777) = -1 EEXIST (File exists)                                  
		17:47:42.170638 statx(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp", AT_STATX_SYNC_AS_STAT, STATX_TYPE|STATX_MODE, {stx_mask =STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|S_ISGID|0775, stx_size=4096, ...}) = 0
		17:47:42.170684 openat(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp/myapp_index_search_slowlog.log", O_RDWR|O_CREAT|O_EXCL, 0666) = -1 EEXIST (File exists)
		17:47:42.170757 stat("/usr/local/mydir/myapp/logs/myapp/myapp_index_search_slowlog.log", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
		17:47:42.170821 openat(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp/myapp_index_search_slowlog.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 47
		17:47:42.170917 stat("/usr/local/mydir/myapp/logs/myapp/myapp_index_search_slowlog.log", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
		17:47:42.170987 access("/usr/local/mydir/myapp/logs/myapp/myapp_index_search_slowlog.log", F_OK) = 0
		17:47:42.171032 statx(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp/myapp_index_search_slowlog.log", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=0, ...}) = 0
		17:47:42.171102 stat("/usr/local/mydir/myapp/logs/myapp/myapp_index_search_slowlog.log", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
		17:47:42.171153 stat("/usr/local/mydir/myapp/logs/myapp/myapp_index_search_slowlog.log", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
		17:47:42.173128 readlink("/usr/local/mydir/myapp/logs", 0x7f3a94feb380, 1023) = -1 EINVAL (Invalid argument)
		17:47:42.173177 readlink("/usr/local/mydir/myapp/logs/myapp", 0x7f3a94feb380, 1023) = -1 EINVAL (Invalid argument)
		17:47:42.173219 readlink("/usr/local/mydir/myapp/logs/myapp/myapp_index_indexing_slowlog.log", 0x7f3a94feb380, 1023) = -1 EINVAL (Invalid argument)
		17:47:42.173263 stat("/usr/local/mydir/myapp/logs/myapp", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
		17:47:42.173328 mkdir("/usr/local/mydir/myapp/logs/myapp", 0777) = -1 EEXIST (File exists)
		17:47:42.173418 statx(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp", AT_STATX_SYNC_AS_STAT, STATX_TYPE|STATX_MODE, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|S_ISGID|0775, stx_size=4096, ...}) = 0
		17:47:42.173465 openat(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp/myapp_index_indexing_slowlog.log", O_RDWR|O_CREAT|O_EXCL, 0666) = -1 EEXIST (File exists)
		17:47:42.173538 stat("/usr/local/mydir/myapp/logs/myapp/myapp_index_indexing_slowlog.log", {st_mode=S_IFREG|0664, st_size=1824, ...}) = 0
		17:47:42.173603 openat(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp/myapp_index_indexing_slowlog.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 48
		17:47:42.173688 stat("/usr/local/mydir/myapp/logs/myapp/myapp_index_indexing_slowlog.log", {st_mode=S_IFREG|0664, st_size=1824, ...}) = 0
		17:47:42.173755 access("/usr/local/mydir/myapp/logs/myapp/myapp_index_indexing_slowlog.log", F_OK) = 0
		17:47:42.173806 statx(AT_FDCWD, "/usr/local/mydir/myapp/logs/myapp/myapp_index_indexing_slowlog.log", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=1824, ...}) = 0
		17:47:42.173904 stat("/usr/local/mydir/myapp/logs/myapp/myapp_index_indexing_slowlog.log", {st_mode=S_IFREG|0664, st_size=1824, ...}) = 0
		17:47:42.173956 stat("/usr/local/mydir/myapp/logs/myapp/myapp_index_indexing_slowlog.log", {st_mode=S_IFREG|0664, st_size=1824, ...}) = 0
		17:47:48.890851 readlink("/proc/290/cwd", "/usr/local/mydir/myapp/logs/search"..., 1023) = 39
		17:47:48.891139 readlink("/usr/local/mydir/myapp/logs", 0x7f3a94feba30, 1023) = -1 EINVAL (Invalid argument)
		17:47:48.891188 readlink("/usr/local/mydir/myapp/logs/myapp", 0x7f3a94feba30, 1023) = -1 EINVAL (Invalid argument)
		17:47:48.891237 readlink("/usr/local/mydir/myapp/logs/myapp/.attach_pid290", 0x7f3a94feba30, 1023) = -1 EINVAL (Invalid argument)
		17:47:48.991963 unlink("/usr/local/mydir/myapp/logs/myapp/.attach_pid290") = 0
		17:47:49.570007 access("/usr/local/mydir/myapp/logs/myapp", F_OK) = 0
		17:47:49.570332 readlink("/usr/local/mydir/myapp/logs", 0x7f3a94febfc0, 1023) = -1 EINVAL (Invalid argument)
		17:47:49.570382 readlink("/usr/local/mydir/myapp/logs/myapp", 0x7f3a94febfc0, 1023) = -1 EINVAL (Invalid argument)
		17:48:12.177462 access("/usr/local/mydir/myapp/logs/myapp", F_OK) = 0
		17:48:12.177791 readlink("/usr/local/mydir/myapp/logs", 0x7f3a438f7fb0, 1023) = -1 EINVAL (Invalid argument)
		17:48:12.177841 readlink("/usr/local/mydir/myapp/logs/myapp", 0x7f3a438f7fb0, 1023) = -1 EINVAL (Invalid argument)

[3] “java_pid” search results

Failing system

	18:02:53.373407 stat("/tmp/.java_pid189", 0x7fb95f1c2a10) = -1 ENOENT (No such file or directory)                                 
	18:02:55.458332 stat("/tmp/.java_pid276", 0x7f6bca835a10) = -1 ENOENT (No such file or directory)                                 
	18:02:56.480801 stat("/tmp/.java_pid303", 0x7f6eaabc4a10) = -1 ENOENT (No such file or directory)                                 
	18:03:05.777122 stat("/proc/303/root/tmp/.java_pid303", 0x7f6eaabc4300) = -1 ENOENT (No such file or directory)                   
	18:03:05.887211 stat("/proc/303/root/tmp/.java_pid303", 0x7f6eaabc4300) = -1 ENOENT (No such file or directory)                   
	18:03:06.087517 stat("/proc/303/root/tmp/.java_pid303", 0x7f6eaabc4300) = -1 ENOENT (No such file or directory)                   
	18:03:06.387764 stat("/proc/303/root/tmp/.java_pid303", 0x7f6eaabc4300) = -1 ENOENT (No such file or directory)                   
	18:03:06.788024 stat("/proc/303/root/tmp/.java_pid303", 0x7f6eaabc4300) = -1 ENOENT (No such file or directory)
	<truncated, ENOENT recorded for 10 seconds>
	
Working system

	17:47:36.857735 stat("/tmp/.java_pid191", 0x7fbd6b77ea10) = -1 ENOENT (No such file or directory)
	17:47:38.790335 stat("/tmp/.java_pid263", 0x7ff7849efa10) = -1 ENOENT (No such file or directory)
	17:47:39.817403 stat("/tmp/.java_pid290", 0x7f3a94feea10) = -1 ENOENT (No such file or directory)
	17:47:48.889487 stat("/proc/290/root/tmp/.java_pid290", 0x7f3a94fee300) = -1 ENOENT (No such file or directory)
	17:47:48.991781 stat("/proc/290/root/tmp/.java_pid290", {st_mode=S_IFSOCK|0600, st_size=0, ...}) = 0
	17:47:48.991886 stat("/proc/290/root/tmp/.java_pid290", {st_mode=S_IFSOCK|0600, st_size=0, ...}) = 0
	17:47:48.992212 stat("/proc/290/root/tmp/.java_pid290", {st_mode=S_IFSOCK|0600, st_size=0, ...}) = 0
	17:47:48.992374 connect(54, {sa_family=AF_UNIX, sun_path="/proc/290/root/tmp/.java_pid290"}, 110) = 0
	17:47:48.992902 connect(54, {sa_family=AF_UNIX, sun_path="/proc/290/root/tmp/.java_pid290"}, 110) = 0
	17:47:48.893425 unlink("/tmp/.java_pid290.tmp") = -1 ENOENT (No such file or directory)
	17:47:48.893501 bind(44, {sa_family=AF_UNIX, sun_path="/tmp/.java_pid290.tmp"}, 110) = 0
	17:47:48.893625 chmod("/tmp/.java_pid290.tmp", 0600) = 0
	17:47:48.893773 chown("/tmp/.java_pid290.tmp", 500, 500) = 0
	17:47:48.893826 rename("/tmp/.java_pid290.tmp", "/tmp/.java_pid290") = 0

The readlinks are IMO irrelevant, it is just checking if paths are symlinks or regular files/dirs. In some scenarios, if it’s expecting the latter, but finds the former, it can be suspicious.

Those logs for failing system dont include the unlink call on /usr/local/mydir/myapp/logs/myapp/.attach_pid290, did you not include it or is it not there.

The logic is a bit confusing - it creates this empty .attach_pid*file but seems to just do a stat on it (in a separate thread) and then unlink it. On a working system. It used to do this elsewhere on the filesystem, but in 8.19.5 (likely all 8.19+) it does so in path.logs directory which it has chdir-ed to so is the CWD.

In a normal shell, as the elasticsearch process running user, can you chdir to the “failing path.logs directory, and while there create then stat then remove an empty file?

EDIT: I read a bit, and it seems this “create .attach_pid, stat that file in another thread, then unlink it” sequence is normal. To quote:

Every HotSpot JVM periodically checks for a file named .attach_pid in /tmp or under its /proc//root/tmp. If it finds one, it knows someone wants to attach.

It’s still not clear why thats happening in the directory from path.logs and not one of the temporary directories above. And maybe, in some cases, this mechanism is failing to actually wake up the “Attach Listener”, so your actual attach fails, via the .java_pid* file, fails. Can maybe workaround with “-Dcom.sun.management.jmxremote” to the JVM options as this forces the “Attach Listener” to start without the need for .attach_pid “prompt”.

sorry, too many posts, but looking in JDK source code ./src/hotspot/os/posix/attachListener_posix.cpp

is this:

  os::snprintf_checked(fn, sizeof(fn), ".attach_pid%d",
                       os::current_process_id());
  RESTARTABLE(::stat(fn, &st), ret);
  if (ret == -1) {
    log_trace(attach)("Failed to find attach file: %s, trying alternate", fn);
    os::snprintf_checked(fn, sizeof(fn), "%s/.attach_pid%d", os::get_temp_directory(),
                         os::current_process_id());
    RESTARTABLE(::stat(fn, &st), ret);
    if (ret == -1) {
      log_debug(attach)("Failed to find attach file: %s", fn);
    }
  }
  if (ret == 0) {
    // simple check to avoid starting the attach mechanism when
    // a bogus non-root user creates the file
    if (os::Posix::matches_effective_uid_or_root(st.st_uid)) {
      init();
      log_trace(attach)("Attach triggered by %s", fn);
      return true;
    } else {
      log_debug(attach)("File %s has wrong user id %d (vs %d). Attach is not triggered", fn, st.st_uid, geteuid());
    }
  }
  return false;

ret = 0 means (I think) the .attach_pid file was successfully created in the path.logs defined directory, which your logs also indicate is so. It then does an ownership checks. If debug logging was enabled for this specific part of the JDK, and I dont know how to do that, I think you may see the error there, as if I’ve understood correctly the files have different owners as the effective UID running elasticsearch, nor is that the root user.

EDIT you can see these logs with -Xlog:attach=trace added to jdk command line. In my system with 8.18.0 you see:

Oct 25 12:36:16 u2024 systemd-entrypoint[265689]: [3.369s][trace][attach] Failed to find attach file: .attach_pid265689, trying alternate
Oct 25 12:36:16 u2024 systemd-entrypoint[265689]: [3.369s][trace][attach] Attach triggered by /tmp/.attach_pid265689

This is because ti tried the “cwd” first, fails, then tries /tmp/…

In 8.19.5 the log is:

Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.054s][trace][attach] Attach triggered by .attach_pid271234
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.054s][debug][attach] default streaming output: 1
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.155s][debug][attach] read request: cmd = getversion
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.155s][debug][attach] read request: arg = options
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.155s][debug][attach] read request: arg =
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.155s][debug][attach] read request: arg =
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.155s][debug][attach] executing command getversion, streaming output: 1 (supported by impl: 1)
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.156s][debug][attach] v2 request, data size = 195
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.156s][debug][attach] option: streaming, value: 0
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.156s][debug][attach] read request: cmd = load
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.156s][debug][attach] read request: arg = instrument
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.156s][debug][attach] read request: arg = false
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.156s][debug][attach] read request: arg = /usr/share/elasticsearch/lib/entitlement-agent/elasticsearch-entitlement-agent-8.19.6.jar=org.elasticsearch.entitlement.initialization.EntitlementInitialization
Oct 25 12:38:27 u2024 systemd-entrypoint[271234]: [3.156s][debug][attach] executing command load, streaming output: 0 (supported by impl: 1

Here the CWD is /var/log/elasticsearch, as 8.19.5 chdir-ed to that directory (I am still unconvinced that was good idea) and obviously with my completely standard setup stuff works first time there, the .attach_pid* file is created there, and the Attach Listener is triggered.

In your case @buitcj I think you might see different messages with -Xlog:attach=trace - please try.

1 Like

Incredible find!

Took me a second but after I did export ES_JAVA_OPTS=-Xlog:attach=trace and launched again, I saw:

[2025-10-25T12:33:58,737][INFO ][org.elasticsearch.bootstrap.Elasticsearch] [] Bootstrapping Entitlements
[5.294s][debug][attach] File .attach_pid216 has wrong user id 50017 (vs 500). Attach is not triggered

You caught me as I’m about to head out the door for a trip, but this is really promising and I’ll continue investigating when I get back!

Try -XX:+StartAttachListener , might “just work”.

Dunno why, just determined to understand the issue, keep my brain busy!

EDIT: I am quietly confident adding -XX:+StartAttachListener will “just work”, as that does not seem to then even log the [debug][attach] File .attach_pid… messages, So I think that part of the (JVM) code is not even called when that option is given, so you avoid the .attach_pid* file being created in the log directory, and issues therein, completely.

% /usr/share/elasticsearch/jdk/bin/java -XX:+PrintFlagsFinal -version | fgrep StartAttachListener
     bool StartAttachListener                      = false                                     {product} {default}
openjdk version "25" 2025-09-16
OpenJDK Runtime Environment (build 25+36-3489)
OpenJDK 64-Bit Server VM (build 25+36-3489, mixed mode, sharing)

check your own JDK has that option, no idea when it was introduced.

1 Like

@RainTown Thank you so much! You really made a superhuman effort to help a desperate random forum poster. Yes, -XX:+StartAttachListener does workaround the problem. I think that’s a wrap, assuming Elasticsearch isn’t going to revert the commit to change the working directory to the logs dir. I’ll quickly recap things here for posterity.

Normal attach mechanism

JVM doesn't have attach listener running
Attacher (in this case Elasticsearch) [indirectly?] creates either /proc/cwd/.attach_pid or /tmp/.attach_pid
JVM checks if .attach_pid file exists and, if so, starts the AttachListener thread and a .java_pid socket endpoint to which the attacher can attach.

My application’s problem:

Using a logs volume requested by a PVC serviced by an EFS CSI driver, the UID and GID of this volume may not be the Elasticsearch user, but instead, it may be 50000 or even a dynamically generated ID (see gidRangeStart/End). This also affects files created in this volume; files created here may be assigned 50000 user and group ownership, despite the Elasticsearch user having created the file.

In 8.19, Elasticsearch changed their working dir to use the logs dir.

Files created in the logs dir will have UID and GID ownership as defined in the storageclass configs, so the .attach_pid will have UID = GID = 50000, for example.

JVM code AttachListener::is_init_trigger() in attachListener_linux.cpp will fail the attachment handshake if the UID (e.g., 50000) of the file does not match the EUID of the process (i.e., Elasticsearch user).

What does -XX:+StartAttachListener do?

Starts the Attach Listener thread at JVM startup and also creates the .java_pid file at startup so the handshaking doesn't need to happen. The problematic .attach_pid doesn't need to get created and checked.

This seems like a pretty significant deviation from the way that filesystems are supposed to behave on Unix-like systems. When a process creates an object in the filesystem (a file or directory, or something more exotic like a Unix domain socket), the object should start out owned by the eUID of the process.

3 Likes

@buitcj Nice summary.

The only bit I dont get is how the owner of the .attach_pid* file, when created in the logs directory, gets set to 50017. Which settings are this? The JVM is still running as 500, and you have fsGroup set to 50017, but there must be an additional settings somewhere else to result in:

I’ve no k8s to hand, but chatGPT suggests this sort of thing might be at play::

aws efs create-access-point \
  --file-system-id fs-12345 \
  --posix-user Uid=50017,Gid=50017 \
  --root-directory "Path=/var/log/app,CreationInfo={OwnerUid=50017,OwnerGid=50017,Permissions=770}"

with

spec:
...
  storageClassName: efs-sc
  csi:
    driver: efs.csi.aws.com
    volumeHandle: fs-12345::fsap-12345 

?

This seems like a pretty significant deviation from the way that filesystems are supposed to behave on Unix-like systems. When a process creates an object in the filesystem (a file or directory, or something more exotic like a Unix domain socket), the object should start out owned by the eUID of the process.

Yes, agreed. This EFS volume isn’t exactly part of our required architecture - rather, customers can specify any volume of their choosing for the logs directory so this only affects a subset of our customers who are specifying EFS without explicitly setting the uid/gid (probably all customers using EFS)

1 Like

@RainTown I’m not a k8s or efs expert, but I think the settings are found here:

GitHub - kubernetes-sigs/aws-efs-csi-driver: CSI Driver for Amazon EFS https://aws.amazon.com/efs/.

So the way I played around with this is to create a persistent volume claim that requests the EFS CSI driver for a volume. In that claim/request, you specify a storage class. In that storage class config, you specify the parameters listed in the link above. In the one I’m tinkering with, I just have

parameters:
  directoryPerms: "700"
  ensureUniqueDirectory: "false"
  fileSystemId: <redacted>
  provisioningMode: efs-ap

Because an uid/gid is not explicitly set, gidRangeStart/End will be used, so my understanding is that the gid range will be between 50000-7000000 (the defaults). I don’t see an equivalent UID range setting, but uid = gid for all files created in this volume mount. And yes, for every new PVC requested we get a seemingly different gid.

My favorite LLM tells me that the behavior of UID/GID ownership for files in the EFS volume will not correspond to current user / creator / euid and that is just the behavior of EFS since it doesn’t have mappings to linux IDs. I am not a file system expert, but that feels odd to me since this isn’t a problem with other types of volumes.

The workarounds are:

a) use -XX:+StartAttachListener as you noted

b) set uid/gid to the elasticsearch user in the storageclass specified by the PVC. This makes it so all files created in the volume will be the one expected by the attach thread. This will work, but our customers never had to do that before so I’m not sure we’ll recommend doing this. I feel like I’d opt to use the startAttachListener flag that you noted to make the fix invisible to customers. Setting the uid/gid in the storageclass also may be conflicting with a reason our customers wanted dynamic gids to be used in the first place - some research tells me having the gids being unique may actually be a security / workload isolation feature that is desired.

c) as noted before, setting runAsUser and fsGroup to just match the uid/gid ownership on the EFS volume, but that isn’t a great solution because someone may have other volumes and other types of volumes needed by elasticsearch that will conflict with this dynamic gid

1 Like

Ughhhh. This thread is IMHO not a good advert for EFS.

I’m glad (and I admit a little surprised) a workaround was found without any code changes anywhere.

$ /usr/share/elasticsearch/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal -version 2>/dev/null | wc -l
601

I also massively underestimated how many -XX options the java (openjdk) command line would support.

Signing off this thread …

Just filling in a missing link in the above conversation: the reason why the attach socket is appearing in the logs directory in 8.19 but not in earlier versions is because of #124966 changing the working directory, which isn’t directly related to the introduction of entitlements in 8.19.

1 Like