I'm using ES as a back-end object store for an API I'm creating for a customer. ES is appropriate because the API includes some search features that benefit from ES's full text searching capabilities. But it's in most respects being used as an object store. And the client wants me to use ES for this because they're familiar with it.
Several of the API methods include pagination parameters: pageSize
and pageNum
.
I have two problems that don't appear to have good solutions in ES. (And by the way, my client is currently on v2.4 and wants this deployed there as well, but I haven't found much potential relief in 5.4).
First problem: Suppose a query comes in with (pageNum-1)*pageSize > 10000
. (I know I can increase index page window size, but docs make that sound scary, and besides, I've got millions of records, so that's probably not going to fly).
There are only two options I can think of to handle such a request:
-
Return an error response. Not cool, and probably not acceptable to my client.
-
Use the scroll API to scroll retrieve and discard the first bunch of results, then continue to use that scroll to retrieve more records. (And re-use that scroll where possible for future requests).
#2 would be OK (though very wasteful) if I could max out the scroll size while in my skipping phase and then put it back to a reasonable value for my actual results-returning phase. But the scroll API doesn't appear to pay attention to the size'
parameter, so my only choices appear to be VERY slow skip phase, using a small scroll size, or having an ES scroll size that doesn't match my API page size, making my code a lot more complicated than it ought to be.
Her's my second problem: Suppose I'm scrolling along just fine with a scroll size of, say, 100, matching the API pageSize
parameter. And then suppose the client doesn't hit my service for a while, and by the time they ask for the next page my ES scroll ID has expired. Again, I could send an error response, but that would suck.
In this case there's a 5.4 feature that could be useful - Search After. I'd need to make sure all my requests were sorted, and that sort keys were unique across all records, and then I'd need to remember the last sort
value reported in any results returned by the now-defunct scroll. That way I could specify that sort value with search_after
in a new query that's otherwise identical to the first, and then continue as normal. There'd be a bit of bookkeeping on the back end, but probably not too grotesque.
But alas that does not exist in 2.4. And besides, it does nothing for my first problem, since i'll have no way to know what sort value to start with in that case.
So bottom line, my question is: am I missing the "right" way to do this, preferably one that will work in 2.4?
If the answer is "no, you're pretty much stuck where you think you are," then can I make a couple feature requests?
-
Support the case where
from+size
> result window size, as long assize
does not, by itself, exceed that size. The fact that I can programmatically use the scroll API to scroll deep into my data means that the server could do it too. Just do the same retrieve/discard loop on the server that I can do on the wire, without wasting all the bandwidth and client-side processing that's currently required. -
Honor certain benign changes to the query in the scroll API, e.g.
size
and_source
parameters (former to allow resumption of desired scrolling after doing a "catch up" set of large scrolls, latter to avoid sending most of the data during such a "skipping" phase). (My use-case for this feature would vanish if #1 were added, but there are probably other use-cases, and it seems a perfectly natural feature. And I would use it if this were implemented but #1 were not.)