Inbuilt support for pagination?


(Hari Shankar) #1

Hi,

We are trying to use es as the data store for our UI. One of the major
requirements for this is sorting and pagination of sorted data. Is there any
inbuilt support in es for pagination? I tried scroll, but it seems it does
not work for sorted data. e.g, I have 10 million records, which I want to
sort on the basis of a particular (numeric) field, and I show 20
results/page. So if the user clicks on page 5, I'd like to show results
81-100 directly.

Also, how does es do sorting on numeric fields? I am assuming it puts
records into buckets of different ranges. (I am completely new to
indexing/search). Also, is fetching the top 20 results of a set faster than
fetching the top 1000, or fetching results 4001-4010, for example?

Thanks,
Hari


(ppearcy) #2

Scrolling should only be used in rare instances.

From the HTTP API, size and from should do the trick for pagination:
http://www.elasticsearch.org/guide/reference/api/search/uri-request.html

Sort fields are pulled into memory which makes the sort operations
quite fast. Not sure the exact method used, though. Also, my
experience with pagination is that it is quite fast and I haven't
noticed any performance degradation, even paginating beyond the
50,000th result.

Best Regards,
Paul

On Jun 23, 10:15 am, Hari Shankar shaan.h...@gmail.com wrote:

Hi,

We are trying to use es as the data store for our UI. One of the major
requirements for this is sorting and pagination of sorted data. Is there any
inbuilt support in es for pagination? I tried scroll, but it seems it does
not work for sorted data. e.g, I have 10 million records, which I want to
sort on the basis of a particular (numeric) field, and I show 20
results/page. So if the user clicks on page 5, I'd like to show results
81-100 directly.

Also, how does es do sorting on numeric fields? I am assuming it puts
records into buckets of different ranges. (I am completely new to
indexing/search). Also, is fetching the top 20 results of a set faster than
fetching the top 1000, or fetching results 4001-4010, for example?

Thanks,
Hari


(Hari Shankar) #3

But how would from+size work with sorting? The ids would not be sorted once
we sort based on another column right? The way we are thinking right now is,
for example, to go to the second page, to check the 20th value of the sorted
field, and use this value in a "from" range query for the next request. But
it would be cumbersome to do this correctly when the sorted field is not
necessarily unique. Also, handling it when there are multiple sort fields
will be cumbersome.

Hari

On Fri, Jun 24, 2011 at 12:52 AM, Paul ppearcy@gmail.com wrote:

Scrolling should only be used in rare instances.

From the HTTP API, size and from should do the trick for pagination:
http://www.elasticsearch.org/guide/reference/api/search/uri-request.html

Sort fields are pulled into memory which makes the sort operations
quite fast. Not sure the exact method used, though. Also, my
experience with pagination is that it is quite fast and I haven't
noticed any performance degradation, even paginating beyond the
50,000th result.

Best Regards,
Paul

On Jun 23, 10:15 am, Hari Shankar shaan.h...@gmail.com wrote:

Hi,

We are trying to use es as the data store for our UI. One of the major
requirements for this is sorting and pagination of sorted data. Is there
any
inbuilt support in es for pagination? I tried scroll, but it seems it
does
not work for sorted data. e.g, I have 10 million records, which I want to
sort on the basis of a particular (numeric) field, and I show 20
results/page. So if the user clicks on page 5, I'd like to show results
81-100 directly.

Also, how does es do sorting on numeric fields? I am assuming it puts
records into buckets of different ranges. (I am completely new to
indexing/search). Also, is fetching the top 20 results of a set faster
than
fetching the top 1000, or fetching results 4001-4010, for example?

Thanks,
Hari


(Clinton Gormley) #4

On Fri, 2011-06-24 at 10:51 +0530, Hari Shankar wrote:

But how would from+size work with sorting? The ids would not be sorted
once we sort based on another column right? The way we are thinking
right now is, for example, to go to the second page, to check the 20th
value of the sorted field, and use this value in a "from" range query
for the next request. But it would be cumbersome to do this correctly
when the sorted field is not necessarily unique. Also, handling it
when there are multiple sort fields will be cumbersome.

Just to be clear, the from field takes a position, not an ID, so:

page - from - size
1 0 10
2 10 10
3 20 10
50 490 10

And sort order is preserved even when the sort value is not unique.

However, the number of docs that need to be processed in order to return
(eg) page 50 is 500 * no_of_shards = 2500 (assuming 5 primary shards).
So you really don't want to offer to return page 5 million.

Do like google and max out at 1,000 results. Who WANTS to see page 5
million anyway?

If you need to retrieve all 5 million docs that match a query, eg to
reindex or export them, then use a scrolled search with search_type=scan

  • they won't be sorted, but it won't kill your ES server either :slight_smile:

clint


(Hari Shankar) #5

Ah, I was thinking from was based on id.

Thanks,
Hari

On Fri, Jun 24, 2011 at 3:24 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

On Fri, 2011-06-24 at 10:51 +0530, Hari Shankar wrote:

But how would from+size work with sorting? The ids would not be sorted
once we sort based on another column right? The way we are thinking
right now is, for example, to go to the second page, to check the 20th
value of the sorted field, and use this value in a "from" range query
for the next request. But it would be cumbersome to do this correctly
when the sorted field is not necessarily unique. Also, handling it
when there are multiple sort fields will be cumbersome.

Just to be clear, the from field takes a position, not an ID, so:

page - from - size
1 0 10
2 10 10
3 20 10
50 490 10

And sort order is preserved even when the sort value is not unique.

However, the number of docs that need to be processed in order to return
(eg) page 50 is 500 * no_of_shards = 2500 (assuming 5 primary shards).
So you really don't want to offer to return page 5 million.

Do like google and max out at 1,000 results. Who WANTS to see page 5
million anyway?

If you need to retrieve all 5 million docs that match a query, eg to
reindex or export them, then use a scrolled search with search_type=scan

  • they won't be sorted, but it won't kill your ES server either :slight_smile:

clint


(Shay Banon) #6

Also note that elasticsearch takes special care to try and optimize "long tail" pagination (though there is a limit, of course). The "query_then_fetch" type makes sure to only fetch doc ids from all shards to do the pagination calculation, and only them goes and fetch the relevant docs needed.

On Friday, June 24, 2011 at 1:18 PM, Hari Shankar wrote:

Ah, I was thinking from was based on id.

Thanks,
Hari

On Fri, Jun 24, 2011 at 3:24 PM, Clinton Gormley <clinton@iannounce.co.uk (mailto:clinton@iannounce.co.uk)> wrote:

On Fri, 2011-06-24 at 10:51 +0530, Hari Shankar wrote:

But how would from+size work with sorting? The ids would not be sorted
once we sort based on another column right? The way we are thinking
right now is, for example, to go to the second page, to check the 20th
value of the sorted field, and use this value in a "from" range query
for the next request. But it would be cumbersome to do this correctly
when the sorted field is not necessarily unique. Also, handling it
when there are multiple sort fields will be cumbersome.

Just to be clear, the from field takes a position, not an ID, so:

page - from - size
1 0 10
2 10 10
3 20 10
50 490 10

And sort order is preserved even when the sort value is not unique.

However, the number of docs that need to be processed in order to return
(eg) page 50 is 500 * no_of_shards = 2500 (assuming 5 primary shards).
So you really don't want to offer to return page 5 million.

Do like google and max out at 1,000 results. Who WANTS to see page 5
million anyway?

If you need to retrieve all 5 million docs that match a query, eg to
reindex or export them, then use a scrolled search with search_type=scan

  • they won't be sorted, but it won't kill your ES server either :slight_smile:

clint


(system) #7