I am fairly new to ES. I was wondering if ES is capable of doing the
pagination? My use case is if the query result is huge, can ES return a
specific amount of data as request? For example, I might only need records
100-200 from the overall result.
Thanks for the information, it is very useful. One more question, does the
data returned from ES contain the total result count when the request
contains the size parameters?
One thing to keep in mind is that if you are continuously indexing,
don't expect to be able to page through the results without missing
some items or seeing duplicates.
You can use the "scan" search type (Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/api/search/search-type.html) instead if you want a
consistent result set that you can page through, but unfortunately you
will not be able to sort the results when using "scan".
One thing to keep in mind is that if you are continuously indexing,
don't expect to be able to page through the results without missing
some items or seeing duplicates.
You can use the "scan" search type (Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/api/search/search-type.html) instead if you want a
consistent result set that you can page through, but unfortunately you
will not be able to sort the results when using "scan".
One thing to keep in mind is that if you are continuously indexing,
don't expect to be able to page through the results without missing
some items or seeing duplicates.
You can use the "scan" search type (Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/api/search/search-type.html) instead if you want a
consistent result set that you can page through, but unfortunately you
will not be able to sort the results when using "scan".
Interesting, I hadn't thought of that. So the timestamp would be the
"create date" of the document? Not sure how filtering would work when
updating or deleting documents, would elasticsearch keep the updated
record in the same "place" while paginating? But I don't see a
solution if documents are deleted while paging.
--Jamshid
One thing to keep in mind is that if you are continuously indexing,
don't expect to be able to page through the results without missing
some items or seeing duplicates.
You can use the "scan" search type (Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/api/search/search-type.html) instead if you want a
consistent result set that you can page through, but unfortunately you
will not be able to sort the results when using "scan".
Interesting, I hadn't thought of that. So the timestamp would be the
"create date" of the document? Not sure how filtering would work when
updating or deleting documents, would elasticsearch keep the updated
record in the same "place" while paginating? But I don't see a
solution if documents are deleted while paging.
--Jamshid
One thing to keep in mind is that if you are continuously indexing,
don't expect to be able to page through the results without missing
some items or seeing duplicates.
You can use the "scan" search type ( Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/api/search/search-type.html) instead if you want a
consistent result set that you can page through, but unfortunately you
will not be able to sort the results when using "scan".
Interesting, I hadn't thought of that. So the timestamp would be the
"create date" of the document? Not sure how filtering would work when
updating or deleting documents, would elasticsearch keep the updated
record in the same "place" while paginating? But I don't see a
solution if documents are deleted while paging.
--Jamshid
One thing to keep in mind is that if you are continuously indexing,
don't expect to be able to page through the results without missing
some items or seeing duplicates.
You can use the "scan" search type ( Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/api/search/search-type.html) instead if you want a
consistent result set that you can page through, but unfortunately you
will not be able to sort the results when using "scan".
You can use scrolling with search_type (like the default, query_then_fetch),
which will guarantee you won't see duplicates or changed data, but, it gets
more and more expensive as you scroll through the results.
Interesting, I hadn't thought of that. So the timestamp would be the
"create date" of the document? Not sure how filtering would work when
updating or deleting documents, would elasticsearch keep the updated
record in the same "place" while paginating? But I don't see a
solution if documents are deleted while paging.
--Jamshid
One thing to keep in mind is that if you are continuously indexing,
don't expect to be able to page through the results without missing
some items or seeing duplicates.
You can use the "scan" search type ( Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/api/search/search-type.html) instead if you want a
consistent result set that you can page through, but unfortunately
you
will not be able to sort the results when using "scan".
So you can use elastic search queries to page through a large dataset using an offset and limit.
If I have 10 million objects to page through, and want them 1000 at a time
(say) will the search be very slow when I ask for "The next 1000 items starting with the 9 millionth item" ?
Yes, it will get slower the farther through the result set you get. It
needs to perform a query for results as far as the (offset+limit) doc, then
select/return the 1000 docs you wanted in the page.
So you can use Elasticsearch queries to page through a large dataset using
an offset and limit.
If I have 10 million objects to page through, and want them 1000 at a time
(say) will the search be very slow when I ask for "The next 1000 items
starting with the 9 millionth item" ?
Scan and scroll can only move forward. If you want to have previous/next
buttons, you would typically run the query once again with different values
of from and size.
Scan/scroll is also not for exposing to "web scale" users. Fine for tens
of users, not for millions. There is non-trivial cost on the cluster
during scan/scroll.
For the most part we just use from and size. There is a setting called
preference that might be worth looking at if you expect lots of scrolling.
Don't allow super deep scrolling - there are some portions of query
execution that end up having to do from + size amounts of work (O(n*log(n)
of it). And they need O(from + size) of memory too. So going too deep can
take up tons of memory.
Scan and scroll can only move forward. If you want to have previous/next
buttons, you would typically run the query once again with different values
of from and size.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.