_search/scroll?search_type=scan bugs/inconsistencies


(schmichael) #1

I use scrolling to page through large sets of documents and noticed
recently I hadn't set search_type=scan when I don't need sorting.

Once I enabled search_type=scan I noticed two inconsistencies compared
against the default search_type:

  1. search_type=scan doesn't return any results when starting to scroll
    while the default search_type returns the first page of results. (The docs
    seem to warn this happens, but it seems like a strange inconsistency within
    the scroll API).
  2. search_type=scan doesn't seem to honor size=N

The second one is a much bigger deal as I have an API backed by an ES
scroll where users can specify the page size they want.

You should be able to fully reproduce what I'm seeing using this:

Do I need to use from=... with size=...? I didn't have to before switching
to search_type=scan.

I'm using ES 1.1.1 on Ubuntu using the deb provided by ES.org.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9f0d0560-a5bb-4ca0-8483-e6038291c780%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(schmichael) #2

On Wednesday, May 21, 2014 11:20:53 AM UTC-7, schmichael wrote:

...
2. search_type=scan doesn't seem to honor size=N

It seems I missed this in the guide:

"When scanning, the size is applied to each shard, so you will get back a
maximum of size * number_of_primary_shards documents in each batch."
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html

...but that only seems to be the case with the scan search_type. Do I just
have to divide the user's requested page size by my number of shards (5 at
this point)?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #3

scan is mainly useful as a way to export data from the index. In the
context of a user interface, I think scroll would make more sense[1]. On a
side note, paging improved significantly for scroll requests in
Elasticsearch 1.2 (in both terms of speed and memory usage).

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html

On Wed, May 21, 2014 at 8:28 PM, schmichael michael@lytics.io wrote:

On Wednesday, May 21, 2014 11:20:53 AM UTC-7, schmichael wrote:

...
2. search_type=scan doesn't seem to honor size=N

It seems I missed this in the guide:

"When scanning, the size is applied to each shard, so you will get back a
maximum of size * number_of_primary_shards documents in each batch."

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html

...but that only seems to be the case with the scan search_type. Do I just
have to divide the user's requested page size by my number of shards (5 at
this point)?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74g0Qqz0VLNDAHDEZ9N%3D2w723vEWaxqicfnQ_Cn1DSfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(schmichael) #4

Thanks for the response Adrien. I'm excited to upgrade to 1.2, but it seems
strange to me that people refer to scan vs. scroll (you're not the first)
as scan is simply a search_type that - AFAIK - can be used for any type of
search (scroll or otherwise).

It just seems strange that setting the search_type=scan changes the
behavior of _search/scroll significantly (size=N semantics differ and no
first page is returned).

On Thu, May 22, 2014 at 5:09 PM, Adrien Grand <
adrien.grand@elasticsearch.com> wrote:

scan is mainly useful as a way to export data from the index. In the
context of a user interface, I think scroll would make more sense[1]. On a
side note, paging improved significantly for scroll requests in
Elasticsearch 1.2 (in both terms of speed and memory usage).

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html

On Wed, May 21, 2014 at 8:28 PM, schmichael michael@lytics.io wrote:

On Wednesday, May 21, 2014 11:20:53 AM UTC-7, schmichael wrote:

...
2. search_type=scan doesn't seem to honor size=N

It seems I missed this in the guide:

"When scanning, the size is applied to each shard, so you will get back a
maximum of size * number_of_primary_shards documents in each batch."

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html

...but that only seems to be the case with the scan search_type. Do I
just have to divide the user's requested page size by my number of shards
(5 at this point)?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/--2SuAbGjkk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74g0Qqz0VLNDAHDEZ9N%3D2w723vEWaxqicfnQ_Cn1DSfA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74g0Qqz0VLNDAHDEZ9N%3D2w723vEWaxqicfnQ_Cn1DSfA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANRei7AS2acfapr2n6wEkgc8rASEdrUZ9gnKnk8z084AUfTo1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5