Is it safe to expose Elasticsearch to the Internet?

If you search the web for info on publicly-accessible Elasticsearch instances, you'll find a lot of articles about ransomware attacks in which databases were hijacked. You'll find this DZone article from 2017 that says

Whatever you do, never expose your cluster nodes to the web . This sounds obvious, but evidently this isn't done by all. Your cluster should never-ever be exposed to the public web .

It goes on to recommend writing a small proxy service with limited functionality. The client talks to the proxy, the proxy talks to Elasticsearch over a private interface. (This article, from a company trying to sell a product, goes into some of the pitfalls of this approach.)

However, this advice seems to predate the general availability of the X-Pack Security plugin. The documentation talks about authenticating users, roles, etc. but doesn't directly answer the question:

Provided you configure it properly (i.e., unauthenticated users cannot read or write data they shouldn't be able to), is it safe to expose an Elasticsearch server to the Internet without a proxy?

For example, if X-Pack has had a history of security vulnerabilities, or if it is possible to tie up compute resources by sending highly complex queries, it would not be advisable to expose its port directly to clients.

My personal opinion is you should never do this, as it relies on a single layer of complex security on top of your critical datastore. It's just not enough, plus you can get all kinds of DDoS attacks and other things to use up CPU cycles or worse.

For example, you can do the same with MySQL; as it requires user/passwords but no one would ever put one on the Internet with that as the only protection.

Use a proxy (wish one existed, or I had time to write one) - plus WHY do you want to put it out there, as presumably there are clients, but limited, and it usually makes little sense to put what is effectively a DB just on-line in public.

Thanks @Steve_Mushero. Consider a basic use case where you are simply providing a web frontend to search an index of public Wikipedia articles. If someone wants to write their own client to query the database, that's fine as well, as long as it doesn't disrupt the service for others.

Elasticsearch already provides a search REST API that the client's browser could speak directly to. The API supports every complex query a user could dream of, plus scrolling for huge result sets.

Writing a proxy is doable but: where is the guidance on this? Identifying and blocking malicious queries is error-prone, but writing a separate interface for each approved query pattern is limited by the imagination and patience of the developer.

I would never recommend any application which hosts critical data to be exposed to Internet. Even if you have gold class security, someone will find a way in

But in case of elastic, what we do is we summary index or re-index certain data (or send via pipeline) to other set of elastic cluster which can be safely exposed.
For example if the customer traffic contains lot of information, but you need to expose to Internet how much your traffic was,

  • index your raw dataset into your main ELK
  • output what all things which are just enough to show-case your business into another pipeline and send it to an elastic cluster outside the security realms. So even if someone gets hold of data they still get only limited info
  • ensure the data pushed out (and no way the external cluster should pull from you)
  • Ensure relevant firewall and security rules are exposed on the pipeline going out and no means it should connect backward
1 Like

There's this page which says:

NOTE: Elasticsearch installations are not designed to be publicly accessible over the Internet. IP Filtering and the other capabilities of the Elasticsearch security features do not change this condition.

There's also this page which says:

Do not expose Elasticsearch to the Internet, instead have an application make requests on behalf of the Internet. Do not entertain the thought of having an application "sanitize" requests to Elasticsearch. Understand that it is possible for a sufficiently determined malicious user to write searches that overwhelm the Elasticsearch cluster and bring it down.

I do think it's fair to say that this important information is rather well hidden in these docs. Would you open a Github issue to suggest making it more prominent?

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.