Hi -- I need to provide access to some ElasticSearch indices both in EC2
(where it is hosted) and Heroku. I've been reading a bit around this, and
the two solutions suggested seem to be using the jetty plugin, or using a
proxy.
Of the two, I'd probably be most comfortable setting up a proxy using
nginx, and having that use SSL, as I'm not really au fait with Java and its
eco-system.
That said, the main issue has to be performance, so I guess my question
should be if there's a difference between the two approaches, and which
people would recommend?
Cheers,
Doug.
I handle my SSL (and other frontend stuff) with nginx, forwarding queries
to a JBoss web app, with query transformations and result XML/XSLT from
JSON, and moved ES into a private network.
Thanks for your response -- that sounds a bit deeper than what I'm looking
for, I think!
Basically, I want to use Elasticsearch securely, and have been researching
that.
Currently, I'm thinking that if I use nginx as a proxy, with basic http
authentication and SSL, that should do it.
Does that strike the group as acceptable?
I'd make https calls to my server on port 9243 (for instance) with the
basic http auth username and password as part of the URL, and have nginx
configured to listen on that port with my SSL credentials.
This should ensure that the username and password sent to the server were
encrypted, shouldn't it?
Then in nginx, as long as the username and password were good, I'd proxy
pass on to the Elasticsearch instance on port 9200.
Does that sound sensible?
Cheers,
Doug.
I handle my SSL (and other frontend stuff) with nginx, forwarding queries
to a JBoss web app, with query transformations and result XML/XSLT from
JSON, and moved ES into a private network.
If you expose ES server to users you need to ensure they can only execute request they should for example search and not index drop. Jetty plufin has role based security. You could probably achieve similar wit proxy based rules
Sorry, I should have explained -- it's not so much to expose the API to
users, but so that I can use Elasticsearch from two sets of servers in
different domains -- one in an Amazon VPC (where ES is hosted), and one in
Heroku.
I'd like any access to the ES indices to be as protected as possible.
If you expose ES server to users you need to ensure they can only execute
request they should for example search and not index drop. Jetty plufin has
role based security. You could probably achieve similar wit proxy based
rules
If you want security from Amazon VPC or Heroku, I have no answer - ask the
vendors.
SSL is for securing end-to-end access in front of ES, over the internet
between the user and the data center. But, SSL is not a protection against
violating data privacy.
If you want to keep your data private, set up a private network, with extra
switches and extra cabling in your data center, and guard the doors by
security service so nobody can just walk in and copy your index to USB
sticks, to cloud storage, or whatever.
So it's not possible to provide secure ES access on the open web without
your own data centre?
Sorry if I'm misunderstanding (I rather suspect I am), but I find that hard
to believe.
I was thinking that maybe by using a reverse proxy like nginx I could add a
layer of security in before ES gets called.
But maybe I should be asking this on an nginx forum.
I just thought people here might have solved the same specific problem.
Cheers anyway,
Doug.
If you want security from Amazon VPC or Heroku, I have no answer - ask the
vendors.
SSL is for securing end-to-end access in front of ES, over the internet
between the user and the data center. But, SSL is not a protection against
violating data privacy.
If you want to keep your data private, set up a private network, with
extra switches and extra cabling in your data center, and guard the doors
by security service so nobody can just walk in and copy your index to USB
sticks, to cloud storage, or whatever.
I was thinking that maybe by using a reverse proxy like nginx I could add a
layer of security in before ES gets called.
sure, I don't see a reason why this shouldn't work. Another option might
be spiped. see [1] for a blog post about it and [2] for the download of
the current version.
With spiped you don't have to teach your clients to use https and basic
auth. It's also pretty easy to set up.
you setup you AWS sec groups /network acls appropriately. I.e. Lock down
all ES ports ( not just 9200) so that the world cannot get to them
directly.
On 06/09/2013 6:59 PM, "Ralf Schmitt" ralf@brainbot.com wrote:
I was thinking that maybe by using a reverse proxy like nginx I could
add a
layer of security in before ES gets called.
sure, I don't see a reason why this shouldn't work. Another option might
be spiped. see [1] for a blog post about it and [2] for the download of
the current version.
With spiped you don't have to teach your clients to use https and basic
auth. It's also pretty easy to set up.
That's really interesting, cheers -- I considered an SSL tunnel, but didn't
really know enough about it to know if it was a good idea for a production
dependency -- spiped looks like it might sort that out.
One problem is that heroku, being what it is, isn't as open to the
dev/sysadmin as other boxes, so I think I'd struggle getting spiped
installed and running.
I was thinking that maybe by using a reverse proxy like nginx I could
add a
layer of security in before ES gets called.
sure, I don't see a reason why this shouldn't work. Another option might
be spiped. see [1] for a blog post about it and [2] for the download of
the current version.
With spiped you don't have to teach your clients to use https and basic
auth. It's also pretty easy to set up.
If you use tools with symmetric key exchange, you must find a method to
transport the keys securely.
I assume the OP would be fine with copying the file via scp.
Symmetric key exchange is a security level with the state-of-the-art back
to mid 1970. Since then, there have been some advantage I suppose...
sorry, but that's just FUD.
spiped is written by Colin Percival, who is also running tarsnap, who
has been the FreeBSD security officer for some time, and who has written
some security related papers. Therefore, I trust his work.
If you use tools with symmetric key exchange, you must find a method to
transport the keys securely.
I assume the OP would be fine with copying the file via scp.
Symmetric key exchange is a security level with the state-of-the-art back
to mid 1970. Since then, there have been some advantage I suppose...
sorry, but that's just FUD.
spiped is written by Colin Percival, who is also running tarsnap, who
has been the FreeBSD security officer for some time, and who has written
some security related papers. Therefore, I trust his work.
ES is storing all data unencrypted, and there is no way to secure data
being transmitted between two ES nodes in the cloud for various reasons.
Even if you could with symmetric keys between nodes, you'd have to share
your key unencrypted in the config on the file system in the cloud on each
node. Now, that is really secure...
You can trust whoever's work but you must understand first what security
really is. It's keeping data in private, not only using encrypted
communication.
Another option I've explored, and think I will go with, is using one of the
ES providers on Heroku. They expose an endpoint we can use, and we can
leave security up to them, seeing as they presumably are the experts.
Thanks very much for the considered and helpful opinions in this thread.
Doug.
ES is storing all data unencrypted, and there is no way to secure data
being transmitted between two ES nodes in the cloud for various reasons.
Even if you could with symmetric keys between nodes, you'd have to share
your key unencrypted in the config on the file system in the cloud on each
node. Now, that is really secure...
You can trust whoever's work but you must understand first what security
really is. It's keeping data in private, not only using encrypted
communication.
ES is storing all data unencrypted, and there is no way to secure data
being transmitted between two ES nodes in the cloud for various
reasons.
that's what VPN's are for.
Even if you could with symmetric keys between nodes, you'd have to share
your key unencrypted in the config on the file system in the cloud on each
node. Now, that is really secure...
If an attacker has access to your filesystem, you've already
lost. what's your point? The OP apparently trusts amazon and heroku not
to mess with his data.
You can trust whoever's work but you must understand first what security
really is. It's keeping data in private, not only using encrypted
communication.
please read the OPs original question. He's basically asking for a
secure way to talk to an elasticsearch cluster. That means encrypted
communication, authentication and authorization. Both spiped and an
SSL-enabled nginx proxy requiring basic auth provide that.
Since you are using Heroku, I'd go with SSL and basic authentication set up
in nginx together with proxying. That way only authenticated requests get
into your ES cluster. You could enhance security further by setting up a
firewall on the ES side that blocks anything coming from outside a
whitelisted set of ips. Finally, instead of basic authentication, you could
also use client side certificates. That way, only clients with the correct
certificates can get in. It goes without saying that if you use SSL, you
should be pooling your connections. Otherwise, you end up paying a huge per
request overhead in the form of SSL handshakes.
Whether you let jetty or nginx do the SSL is really a matter of taste. Both
are pretty capable solutions. Nginx is quite simple to set up though and
has the advantage that you can run any off the shelf Elasticsearch
distribution without bothering with plugins.
Jilles
On Thursday, September 5, 2013 11:30:55 AM UTC+2, doug livesey wrote:
Hi -- I need to provide access to some Elasticsearch indices both in EC2
(where it is hosted) and Heroku. I've been reading a bit around this, and
the two solutions suggested seem to be using the jetty plugin, or using a
proxy.
Of the two, I'd probably be most comfortable setting up a proxy using
nginx, and having that use SSL, as I'm not really au fait with Java and its
eco-system.
That said, the main issue has to be performance, so I guess my question
should be if there's a difference between the two approaches, and which
people would recommend?
Cheers,
Doug.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.