Does ES support conditional retrieval? (304 NOT_MODIFIED)

Hi all,
the Google REST protocol supports Conditional Retrieval:
https://developers.google.com/gdata/docs/2.0/reference#ResourceVersioning

In this model, you receive a version ID in the response to your GET
request. Next time you fire a GET at that URL you pass the version ID. If
the URL results are unchanged, you receive an HTTP 304 (NOT MODIFIED), with
no query results - which means less bits have to travel down the wire.

This would seem to work well for ES search queries. ES would still perform
the search, but if nothing is changed, then cached data on the client can
be re-used.

I welcome your advice on this,
Ian

--

Hi Ian,

ES currently has two mechanisms:

  • an If-Match header for checking if a version of a document exists

  • a HEAD request, where you can check for the existence of a document. It
    answers with HTTP 200 or

  1. http://www.elasticsearch.org/guide/reference/api/get.html

Conditional retrieval is useful if caching is much cheaper than generating
a a new response at server side. That is often true for caching the
response on the client side. There might be situations when ES should
behave like a repository and not like a realtime search engine, so that
client caching seems useful. For example, serving large docs over the Get
API and a lot of clients fetching the docs in parallel could be such a
situation. But note, the Get API is very fast (it's cached on server side),
and with the extreme scalability of ES, clients will no longer have to
tackle tight server resources.

Anyway, how could conditional retrieval be implemented? The HEAD request
could be extended in the ES code to answer also with HTTP 304, e.g. by
using ETags. ES would have to deliver ETags for each document in the Get
API, which should be configurable in the settings because it adds some
overhead.

I don't feel that search responses should be cached like documents at
client side. Why would you issue a new query if not for requesting the most
current state? If you don't want to check for a most current state of a
query result, just do not submit the query.

Best regards,

Jörg

On Friday, November 23, 2012 12:08:23 PM UTC+1, ian mayo wrote:

Hi all,
the Google REST protocol supports Conditional Retrieval:
https://developers.google.com/gdata/docs/2.0/reference#ResourceVersioning

In this model, you receive a version ID in the response to your GET
request. Next time you fire a GET at that URL you pass the version ID. If
the URL results are unchanged, you receive an HTTP 304 (NOT MODIFIED), with
no query results - which means less bits have to travel down the wire.

This would seem to work well for ES search queries. ES would still
perform the search, but if nothing is changed, then cached data on the
client can be re-used.

I welcome your advice on this,
Ian

--

Hi there Jörg, thanks for the thoughtful answer.

On Saturday, 24 November 2012 00:45:23 UTC, Jörg Prante wrote:

I don't feel that search responses should be cached like documents at
client side. Why would you issue a new query if not for requesting the most
current state? If you don't want to check for a most current state of a
query result, just do not submit the query.

I guess this is what I didn't explain clearly enough. The conditional
requests would be to reduce the volume of data passed in the response -
which in turn improves the responsiveness.

Here's a use-case.

We have a house-finding app, where the user looks at houses to buy using
faceted search. The user slowly adds more criteria to find the house of
interest ("has 3 bedrooms", "has off-street parking", "within 5 miles of
railway station", etc). So, a search request is made with these 3
criteria. The results are displayed, and the etag is stored. The user then
adds another criteria "within 1 mile of school". The previous results are
cached with their etag, the new request is sent, and the new results
displayed. The user then removes the last criterion. The app knows that it
holds the cached results for the previous search. It then fires a request
at the server, providing the etag. If the contents of the search results
have changed, the server will reply with the new results. If contents
haven't changed, it returns 304 - and no search results, the client simply
displays the cached data.

Later, the user returns to the home page. This has a 'New Houses in your
area" listing. The app fires the search to the server, with the etag from
when the listing was first shown. If the server returns 304 then the
cached results can be shown.

So, this proposal isn't about reducing server loading. It's about avoiding
some very large results sets being sent down the wire in a situation where
its unlikely (but not impossible) that the underlying data has changed.

Hope this clarifies things Jörg

--

Hi Ian,

thanks for describing your scenario so thoroughly! It all makes sense. I
didn't think about filters and facets that may be cached in the client app
in native code, and not sending search responses in certain situations
would reduce some processing overhead.

I have hacked the ES source code a little bit for implementing a simple
ETag handling and opened a pull request.

Cheers,

Jörg

--