I use scrolling to page through large sets of documents and noticed
recently I hadn't set search_type=scan when I don't need sorting.
Once I enabled search_type=scan I noticed two inconsistencies compared
against the default search_type:
search_type=scan doesn't return any results when starting to scroll
while the default search_type returns the first page of results. (The docs
seem to warn this happens, but it seems like a strange inconsistency within
the scroll API).
search_type=scan doesn't seem to honor size=N
The second one is a much bigger deal as I have an API backed by an ES
scroll where users can specify the page size they want.
You should be able to fully reproduce what I'm seeing using this:
Do I need to use from=... with size=...? I didn't have to before switching
to search_type=scan.
I'm using ES 1.1.1 on Ubuntu using the deb provided by ES.org.
On Wednesday, May 21, 2014 11:20:53 AM UTC-7, schmichael wrote:
...
2. search_type=scan doesn't seem to honor size=N
It seems I missed this in the guide:
"When scanning, the size is applied to each shard, so you will get back a
maximum of size * number_of_primary_shards documents in each batch."
...but that only seems to be the case with the scan search_type. Do I just
have to divide the user's requested page size by my number of shards (5 at
this point)?
scan is mainly useful as a way to export data from the index. In the
context of a user interface, I think scroll would make more sense[1]. On a
side note, paging improved significantly for scroll requests in
Elasticsearch 1.2 (in both terms of speed and memory usage).
[1]
On Wed, May 21, 2014 at 8:28 PM, schmichael michael@lytics.io wrote:
On Wednesday, May 21, 2014 11:20:53 AM UTC-7, schmichael wrote:
...
2. search_type=scan doesn't seem to honor size=N
It seems I missed this in the guide:
"When scanning, the size is applied to each shard, so you will get back a
maximum of size * number_of_primary_shards documents in each batch."
...but that only seems to be the case with the scan search_type. Do I just
have to divide the user's requested page size by my number of shards (5 at
this point)?
Thanks for the response Adrien. I'm excited to upgrade to 1.2, but it seems
strange to me that people refer to scan vs. scroll (you're not the first)
as scan is simply a search_type that - AFAIK - can be used for any type of
search (scroll or otherwise).
It just seems strange that setting the search_type=scan changes the
behavior of _search/scroll significantly (size=N semantics differ and no
first page is returned).
scan is mainly useful as a way to export data from the index. In the
context of a user interface, I think scroll would make more sense[1]. On a
side note, paging improved significantly for scroll requests in
Elasticsearch 1.2 (in both terms of speed and memory usage).
...but that only seems to be the case with the scan search_type. Do I
just have to divide the user's requested page size by my number of shards
(5 at this point)?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.