High number of org.elasticsearch.index.IndexNotFoundException exceptions detected by Dynatrace

We have deployed Elasticsearch 7.6.1 onto a Kubernetes cluster for log monitoring. Logs are collected using Fluent Bit and sent to Elasticsearch. Everything seems to be working fine. But yesterday, one of the admins contacted me to ask about a large number (700,000+ in 10 hours) of (org.elasticsearch.index.IndexNotFoundException and sun.nio.fs.UnixException) exceptions being detected by Dynatrace. Looking into it, these exception happen all the time. I suspect/assume they are triggered with every "chunk" sent from Fluent Bit.

I confirmed that Elasticsearch appears to be working fine and log messages are flowing in as expected. And there are no ERROR (or WARNING) messages in the Elasticsearch log file indicating this is a problem. But I'm being asked to clarify why the exceptions are occurring.

The IndexNotFoundException exceptions mention a specific index name that I know is the index name Fluent Bit is sending documents to. But we have an ingest pipeline that intercepts the incoming load and redirects them to different indexes based on fields within the message. So, no documents are every being written to the dummy index and it is never created. But I'm confused why Elasticsearch is attempting to verify it exists and since there seems to be tight correlation between the Elasticsearch exception and the file system exception, why it is (apparently) making calls to the file system to check for the index.

I went ahead and created the "dummy" index this morning and assumed that this would eliminate these exceptions. But, strangely, while the number of Elasticsearch exceptions was significantly reduced (by about 75%) they weren't eliminated completely. Thinking about it now, I just realized I never added any documents to this dummy index...so maybe the index metadata exists but no index files exist on disk at this point.

In any case, can anyone clarify why these exceptions are being thrown when nothing is actually being sent to this "dummy" index?

Thanks!

I imagine you're detecting one of the places where throwing an exception was simpler or more correct than all the alternative control flow mechanisms, so this could well be expected behaviour. For example, the only way to truly determine whether a file exists and is accessible is see if opening it throws an exception or not. The answer may be simply to stop monitoring Elasticsearch at this level and treat it as a self-contained black box; if it were written in something other than Java then you probably wouldn't be able to see this at all.

Can you share the stack trace that corresponds with the exception anyway, in case there's an opportunity to streamline something here?

@DavidTurner Thanks for responding. I think you are probably right that these are benign exceptions that are only being detected because an admin plugged Dynatrace into the bowels of the application. I will pass along your response to the team and close the issue.

I don't have a traditional stack trace but I pulled the following pseudo-trace from the Dynatrace UI. (I didn't see a "print traditional stack trace" item in that application.) I hope you find it helpful: https://gist.github.com/gsmith-sas/49005b8b2e3f990807660aaaf28a67cf.

Thanks again.

Thanks, that tells us that this exception is coming from some Open Distro for Elasticsearch modifications to the real Elasticsearch; we don't recommend or support ODFE so can't offer any further help here. Best to use the official distribution instead.

1 Like

Understood. I didn't notice the ODFE items in the stack trace until I posted the gist...would have posted to their board if I had. Appreciate your response in any case.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.