Using part of ES as a public, frontend API?


(Bartosz Pietrzak) #1

Quick question (as I have not seen any opinions on this topic) - using
part of ES as a public application's API would be a good idea?

Our app relies heavily on search (livesearch, in general) - limiting
every layer (ruby app, in this case) would be a huge performance boost
and since ES uses JSON - this was an obvious thing that came to my
mind: use it as the livesearch underlying backend API directly. Bad
idea, good idea?


(Lukáš Vlček) #2

Hi,
I would rather not do it. Couple of reasons why I think this is not a good idea:

  1. API can change. Although ES API has been very stable for major part
    the release 1.0 is still not here.
  2. Depending on what part of API you want to expose you should be
    careful. Even if you expose only search related API it would allow
    detailed inspection of your index structure and then anybody could
    create queries that can put unnecessary load on our servers.
  3. If you want to log your use activity then you should do it before
    the request hits first ES node.

I would personaly recommend the opposite: write your own proxy to
expose only the minimal function set.

Regards,
Lukas

Dne neděle, 26. června 2011, Bartosz Pietrzak pietrzak@bartosz.me napsal(a):

Quick question (as I have not seen any opinions on this topic) - using
part of ES as a public application's API would be a good idea?

Our app relies heavily on search (livesearch, in general) - limiting
every layer (ruby app, in this case) would be a huge performance boost
and since ES uses JSON - this was an obvious thing that came to my
mind: use it as the livesearch underlying backend API directly. Bad
idea, good idea?


(Eric Mill) #3

I've been thinking about this too - is there any way within ES to limit
public access to the read-only endpoints, or is that something that has to
be configured within the hosting web server (not sure how the web server
world works in Javaland)?

-- Eric

On Sun, Jun 26, 2011 at 4:56 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,
I would rather not do it. Couple of reasons why I think this is not a good
idea:

  1. API can change. Although ES API has been very stable for major part
    the release 1.0 is still not here.
  2. Depending on what part of API you want to expose you should be
    careful. Even if you expose only search related API it would allow
    detailed inspection of your index structure and then anybody could
    create queries that can put unnecessary load on our servers.
  3. If you want to log your use activity then you should do it before
    the request hits first ES node.

I would personaly recommend the opposite: write your own proxy to
expose only the minimal function set.

Regards,
Lukas

Dne neděle, 26. června 2011, Bartosz Pietrzak pietrzak@bartosz.me
napsal(a):

Quick question (as I have not seen any opinions on this topic) - using
part of ES as a public application's API would be a good idea?

Our app relies heavily on search (livesearch, in general) - limiting
every layer (ruby app, in this case) would be a huge performance boost
and since ES uses JSON - this was an obvious thing that came to my
mind: use it as the livesearch underlying backend API directly. Bad
idea, good idea?


(Tim Hawkins) #4

Guys, this is seriously not a good idea, ES is designed as a systems component, It is designed to operate inside a closed network as part of a back-end system. It does not have the security and intrusion hardening exposure needed to operate as a direct connected internet service.

As stated before, use a service between ES and the web to buffer, rate limit, log and allow blocking of dangerous options.

Also remember that even with a readonly query system, if you allow arbitrary queries from hosts that have little or not involvement in the design and construction of the data sets provided, then you WILL get major performance issues, again a intervening service would limit queries to the operations that your datasets can safely provide.

--
Tim Hawkins
Sent with Sparrow (http://bit.ly/sigsprw)

On Monday, June 27, 2011 at 9:33 AM, Eric Mill wrote:

I've been thinking about this too - is there any way within ES to limit public access to the read-only endpoints, or is that something that has to be configured within the hosting web server (not sure how the web server world works in Javaland)?

-- Eric

On Sun, Jun 26, 2011 at 4:56 PM, Lukáš Vlček <lukas.vlcek@gmail.com (mailto:lukas.vlcek@gmail.com)> wrote:

Hi,
I would rather not do it. Couple of reasons why I think this is not a good idea:

  1. API can change. Although ES API has been very stable for major part
    the release 1.0 is still not here.
  2. Depending on what part of API you want to expose you should be
    careful. Even if you expose only search related API it would allow
    detailed inspection of your index structure and then anybody could
    create queries that can put unnecessary load on our servers.
  3. If you want to log your use activity then you should do it before
    the request hits first ES node.

I would personaly recommend the opposite: write your own proxy to
expose only the minimal function set.

Regards,
Lukas

Dne neděle, 26. června 2011, Bartosz Pietrzak <pietrzak@bartosz.me (mailto:pietrzak@bartosz.me)> napsal(a):

Quick question (as I have not seen any opinions on this topic) - using
part of ES as a public application's API would be a good idea?

Our app relies heavily on search (livesearch, in general) - limiting
every layer (ruby app, in this case) would be a huge performance boost
and since ES uses JSON - this was an obvious thing that came to my
mind: use it as the livesearch underlying backend API directly. Bad
idea, good idea?


(Eric Mill) #5

ElasticSearch also supports JSONP and powers the search on elasticsearch.org. I
see your points, and even agree that it is likely a bad idea, but you didn't
answer my question. Is there any way within ES to limit destructive
endpoints to credentialed users, or should this be done at the web server
level?

-- Eric

On Mon, Jun 27, 2011 at 2:46 AM, Tim Hawkins tim.thawkins@gmail.com wrote:

Guys, this is seriously not a good idea, ES is designed as a systems
component, It is designed to operate inside a closed network as part of a
back-end system. It does not have the security and intrusion hardening
exposure needed to operate as a direct connected internet service.

As stated before, use a service between ES and the web to buffer, rate
limit, log and allow blocking of dangerous options.

Also remember that even with a readonly query system, if you allow
arbitrary queries from hosts that have little or not involvement in the
design and construction of the data sets provided, then you WILL get major
performance issues, again a intervening service would limit queries to the
operations that your datasets can safely provide.

--
Tim Hawkins
Sent with Sparrow (http://bit.ly/sigsprw)

On Monday, June 27, 2011 at 9:33 AM, Eric Mill wrote:

I've been thinking about this too - is there any way within ES to limit
public access to the read-only endpoints, or is that something that has to
be configured within the hosting web server (not sure how the web server
world works in Javaland)?

-- Eric

On Sun, Jun 26, 2011 at 4:56 PM, Lukáš Vlček <lukas.vlcek@gmail.com(mailto:
lukas.vlcek@gmail.com)> wrote:

Hi,
I would rather not do it. Couple of reasons why I think this is not a
good idea:

  1. API can change. Although ES API has been very stable for major part
    the release 1.0 is still not here.
  2. Depending on what part of API you want to expose you should be
    careful. Even if you expose only search related API it would allow
    detailed inspection of your index structure and then anybody could
    create queries that can put unnecessary load on our servers.
  3. If you want to log your use activity then you should do it before
    the request hits first ES node.

I would personaly recommend the opposite: write your own proxy to
expose only the minimal function set.

Regards,
Lukas

Dne neděle, 26. června 2011, Bartosz Pietrzak <pietrzak@bartosz.me(mailto:
pietrzak@bartosz.me)> napsal(a):

Quick question (as I have not seen any opinions on this topic) -
using

part of ES as a public application's API would be a good idea?

Our app relies heavily on search (livesearch, in general) - limiting
every layer (ruby app, in this case) would be a huge performance
boost

and since ES uses JSON - this was an obvious thing that came to my
mind: use it as the livesearch underlying backend API directly. Bad
idea, good idea?


(Lukáš Vlček) #6

Hi,

there is nothing in ES out of the box to support user roles, authorization,
authentication and secure access. You need to handle that yourself.

Regards,
Lukas

On Mon, Jun 27, 2011 at 4:37 PM, Eric Mill kprojection@gmail.com wrote:

ElasticSearch also supports JSONP and powers the search on
elasticsearch.org. I see your points, and even agree that it is likely a
bad idea, but you didn't answer my question. Is there any way within ES to
limit destructive endpoints to credentialed users, or should this be done at
the web server level?

-- Eric

On Mon, Jun 27, 2011 at 2:46 AM, Tim Hawkins tim.thawkins@gmail.comwrote:

Guys, this is seriously not a good idea, ES is designed as a systems
component, It is designed to operate inside a closed network as part of a
back-end system. It does not have the security and intrusion hardening
exposure needed to operate as a direct connected internet service.

As stated before, use a service between ES and the web to buffer, rate
limit, log and allow blocking of dangerous options.

Also remember that even with a readonly query system, if you allow
arbitrary queries from hosts that have little or not involvement in the
design and construction of the data sets provided, then you WILL get major
performance issues, again a intervening service would limit queries to the
operations that your datasets can safely provide.

--
Tim Hawkins
Sent with Sparrow (http://bit.ly/sigsprw)

On Monday, June 27, 2011 at 9:33 AM, Eric Mill wrote:

I've been thinking about this too - is there any way within ES to limit
public access to the read-only endpoints, or is that something that has to
be configured within the hosting web server (not sure how the web server
world works in Javaland)?

-- Eric

On Sun, Jun 26, 2011 at 4:56 PM, Lukáš Vlček <lukas.vlcek@gmail.com(mailto:
lukas.vlcek@gmail.com)> wrote:

Hi,
I would rather not do it. Couple of reasons why I think this is not a
good idea:

  1. API can change. Although ES API has been very stable for major
    part

the release 1.0 is still not here.
2) Depending on what part of API you want to expose you should be
careful. Even if you expose only search related API it would allow
detailed inspection of your index structure and then anybody could
create queries that can put unnecessary load on our servers.
3) If you want to log your use activity then you should do it before
the request hits first ES node.

I would personaly recommend the opposite: write your own proxy to
expose only the minimal function set.

Regards,
Lukas

Dne neděle, 26. června 2011, Bartosz Pietrzak <pietrzak@bartosz.me(mailto:
pietrzak@bartosz.me)> napsal(a):

Quick question (as I have not seen any opinions on this topic) -
using

part of ES as a public application's API would be a good idea?

Our app relies heavily on search (livesearch, in general) - limiting
every layer (ruby app, in this case) would be a huge performance
boost

and since ES uses JSON - this was an obvious thing that came to my
mind: use it as the livesearch underlying backend API directly. Bad
idea, good idea?


(Karel Minarik) #7

a) It is very easy to support authorization, authentication, secure
access and much more on the HTTP level. See this gist for
inspiration: https://gist.github.com/986390

b) It is a not convenient to directly expose the ES API to your
users. I don't believe a thin Ruby wrapper around ES adds any
significant overhead. And you'll need things like authentication,
throttling, storing analytics anyway...

Karel

On Jun 27, 4:46 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

there is nothing in ES out of the box to support user roles, authorization,
authentication and secure access. You need to handle that yourself.

Regards,
Lukas

On Mon, Jun 27, 2011 at 4:37 PM, Eric Mill kproject...@gmail.com wrote:

ElasticSearch also supports JSONP and powers the search on
elasticsearch.org. I see your points, and even agree that it is likely a
bad idea, but you didn't answer my question. Is there any way within ES to
limit destructive endpoints to credentialed users, or should this be done at
the web server level?

-- Eric

On Mon, Jun 27, 2011 at 2:46 AM, Tim Hawkins tim.thawk...@gmail.comwrote:

Guys, this is seriously not a good idea, ES is designed as a systems
component, It is designed to operate inside a closed network as part of a
back-end system. It does not have the security and intrusion hardening
exposure needed to operate as a direct connected internet service.

As stated before, use a service between ES and the web to buffer, rate
limit, log and allow blocking of dangerous options.

Also remember that even with a readonly query system, if you allow
arbitrary queries from hosts that have little or not involvement in the
design and construction of the data sets provided, then you WILL get major
performance issues, again a intervening service would limit queries to the
operations that your datasets can safely provide.

--
Tim Hawkins
Sent with Sparrow (http://bit.ly/sigsprw)

On Monday, June 27, 2011 at 9:33 AM, Eric Mill wrote:

I've been thinking about this too - is there any way within ES to limit
public access to the read-only endpoints, or is that something that has to
be configured within the hosting web server (not sure how the web server
world works in Javaland)?

-- Eric

On Sun, Jun 26, 2011 at 4:56 PM, Lukáš Vlček <lukas.vl...@gmail.com(mailto:
lukas.vl...@gmail.com)> wrote:

Hi,
I would rather not do it. Couple of reasons why I think this is not a
good idea:

  1. API can change. Although ES API has been very stable for major
    part

the release 1.0 is still not here.
2) Depending on what part of API you want to expose you should be
careful. Even if you expose only search related API it would allow
detailed inspection of your index structure and then anybody could
create queries that can put unnecessary load on our servers.
3) If you want to log your use activity then you should do it before
the request hits first ES node.

I would personaly recommend the opposite: write your own proxy to
expose only the minimal function set.

Regards,
Lukas

Dne neděle, 26. června 2011, Bartosz Pietrzak <pietr...@bartosz.me(mailto:
pietr...@bartosz.me)> napsal(a):

Quick question (as I have not seen any opinions on this topic) -
using

part of ES as a public application's API would be a good idea?

Our app relies heavily on search (livesearch, in general) - limiting
every layer (ruby app, in this case) would be a huge performance
boost

and since ES uses JSON - this was an obvious thing that came to my
mind: use it as the livesearch underlying backend API directly. Bad
idea, good idea?


(Mahendra M) #8

Hi,

On Sun, Jun 26, 2011 at 9:43 PM, Bartosz Pietrzak pietrzak@bartosz.me wrote:

Quick question (as I have not seen any opinions on this topic) - using
part of ES as a public application's API would be a good idea?

Our app relies heavily on search (livesearch, in general) - limiting
every layer (ruby app, in this case) would be a huge performance boost
and since ES uses JSON - this was an obvious thing that came to my
mind: use it as the livesearch underlying backend API directly. Bad
idea, good idea?

In my case, I have started off by exposing CouchDB APIs to external clients, but
over time I have exposed ElasticSearch APIs to our data.

My setup is something like this:

  1. nginx (security, https, limiting to GET/POST) etc. This also does
    load balancing to the below layers.
  2. A django (spawning + eventlet) server handling auth and acls.
  3. ElasticSearch beneath django.

The plan is to slowly remove the Django setup in between.

Also planning to use the "aliases" feature to restrict what queries
can be run on ES.
( https://github.com/elasticsearch/elasticsearch/issues/971 ). Have
not tried the aliases feature myself. Just my idea.

Regards,
Mahendra

http://twitter.com/mahendra


(system) #9