Profiling_inferred_spans_enabled causes IOException: Input/output error when closing a filestream on a NAS server that uses Windows

Kibana: 7.9
Elasticsearch: 7.9
Java 1.18.1
OS: Linux

When profiling_inferred_spans_enabled is enabled java throws the following exception when closing a filestream:

java.io.IOException: Input/output error
	at java.io.FileOutputStream.close0(Native Method)
	at java.io.FileOutputStream.access$000(FileOutputStream.java:53)
	at java.io.FileOutputStream$1.close(FileOutputStream.java:356)
	at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
	at java.io.FileOutputStream.close(FileOutputStream.java:354)

(observe that the exception is thrown from a native method, suggesting that the async-profiler is the cause as it instruments native methods)

The APM java documentation says that infered_spans is not supported on Windows because the async-profiler dependency does not support windows. My 'problem' was that I used APM on a weblogic server on Linux but the IOException occurs when i close a filestream on a linux directory that is mapped to a network storage with Windows OS.

Disabling profiling_inferred_spans seems to fix the issue because then the async-profiler is not being used anymore.

I think it would be good to include in the documentation that profiling_inferred_spans_enabled is not compatible when files are written to a linux directory that is mounted to a windows attached network storage. Even if your own application is running on linux.

Hi @NickWe,

Using Linux from within Windows, either through any container or WSL is quite an usual setup, and I'm afraid that the agent won't be able to detect which filesystem it's currently using.

However, while the async profiler is not supported on Windows, we should at least:

  • not have any unexpected side-effect on the application
  • gracefully disable the feature when it's not supported

Where do you see this exception thrown ?

  • within your application for any file access using FileOutputStream ?
  • only within the agent logs ?
  • could you provide the full stack trace and/or the agent log with log_level=debug ?

To add to Sylvains suggestions, could you try out the latest snapshot from master? This version contains a new version of async-profiler. Maybe the problem is already solved there.

Did you have a look at the GitHub issues for async-profiler for similar reported issues?

Hi @Sylvain_Juge. I think there is a misunderstandinng. The application runs on Linux. The application writes to a linux directory "/share/virusscan". A windows CIFS share is mounted on the /share/virusscan directory. So indirectly the java application, that runs on Linux, writes to a windows network share.

https://linuxize.com/post/how-to-mount-cifs-windows-share-on-linux/#:~:text=On%20Linux%20and%20UNIX%20operating,is%20a%20form%20of%20SMB.

The error occurs anytime I have a try with statement with a FileOutputStream as the .close method is getting called then, which eventually calls to native close0 method which throws an exception.

I can't share more than this stacktrace unfortunately. After the stack trace I initially sent there is only application code.

I think the error occurs because of the async-profiler but I thought it would be good to document it.
Also when I have profiling inferred spans off the error does not occur anymore so thats also why I think its because of the async-profiler.

While on the surface networked file systems are similar to local ones, they rarely work exactly the same in practice, there is at least extra latency. If you add the fact that the network share has virusscan in it's folder name and there might be an anti-virus involved, we can't really expect it to behave like a local filesystem with low latency and high reliability.

Our agent only uses disk storage for logs and temporary storage for the async profiler, neither of which require to be permanently stored as it uses /tmp folder. Does the whole linux filesystem is stored in this network share ?

As Felix already told you, trying our latest release might be a good option as it includes an async-profiler update. Also, could you confirm that this error goes away when the profiler is disabled ?

Hi @Sylvain_Juge No there is not a network attached storage attached to the /tmp folder.

I just tried it with the snapshot release and with inferred_spans_enabled. I still get an Input/Output error when the close0 method is executed. I think this happens because async-profiler does not work with windows and well, the network attached storage uses windows.

I will just use APM without the inferred_spans_enabled feature. Just wanted to point the error out.
Maybe it's useful to test inferred_spans_enabled on a Linux VM and see how it interacts with a networked file system that use windows in a CI job.

Thanks for the help / fast replies though, I'm already happy that APM works, without profiling_inferred_spans, for the legacy application I'm working on :slight_smile: