_search/scroll?search_type=scan bugs/inconsistencies

schmichael · May 21, 2014, 6:20pm

I use scrolling to page through large sets of documents and noticed
recently I hadn't set search_type=scan when I don't need sorting.

Once I enabled search_type=scan I noticed two inconsistencies compared
against the default search_type:

search_type=scan doesn't return any results when starting to scroll
while the default search_type returns the first page of results. (The docs
seem to warn this happens, but it seems like a strange inconsistency within
the scroll API).
search_type=scan doesn't seem to honor size=N

The second one is a much bigger deal as I have an API backed by an ES
scroll where users can specify the page size they want.

You should be able to fully reproduce what I'm seeing using this:

gist.github.com

https://gist.github.com/schmichael/5c3ff512359262970d16

es_scan_size.sh

#!/bin/bash
set -o nounset
set -o errexit

echo "This script demonstrates inconsistencies between scan scrolls and scrolls with the default search type"
echo
echo "-- Create a test index and some documents"

curl -s -XPUT "http://localhost:9200/testindex" > /dev/null
curl -s -XPUT "http://localhost:9200/testindex/t/1" -d'

This file has been truncated. show original

Do I need to use from=... with size=...? I didn't have to before switching
to search_type=scan.

I'm using ES 1.1.1 on Ubuntu using the deb provided by ES.org.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9f0d0560-a5bb-4ca0-8483-e6038291c780%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

schmichael · May 21, 2014, 6:28pm

On Wednesday, May 21, 2014 11:20:53 AM UTC-7, schmichael wrote:

...
2. search_type=scan doesn't seem to honor size=N

It seems I missed this in the guide:

"When scanning, the size is applied to each shard, so you will get back a
maximum of size * number_of_primary_shards documents in each batch."

...but that only seems to be the case with the scan search_type. Do I just
have to divide the user's requested page size by my number of shards (5 at
this point)?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · May 23, 2014, 12:09am

scan is mainly useful as a way to export data from the index. In the
context of a user interface, I think scroll would make more sense[1]. On a
side note, paging improved significantly for scroll requests in
Elasticsearch 1.2 (in both terms of speed and memory usage).

[1]

On Wed, May 21, 2014 at 8:28 PM, schmichael michael@lytics.io wrote:

On Wednesday, May 21, 2014 11:20:53 AM UTC-7, schmichael wrote:

...
2. search_type=scan doesn't seem to honor size=N

It seems I missed this in the guide:

"When scanning, the size is applied to each shard, so you will get back a
maximum of size * number_of_primary_shards documents in each batch."

Elasticsearch Platform — Find real-time answers at scale | Elastic

...but that only seems to be the case with the scan search_type. Do I just
have to divide the user's requested page size by my number of shards (5 at
this point)?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74g0Qqz0VLNDAHDEZ9N%3D2w723vEWaxqicfnQ_Cn1DSfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

schmichael · May 23, 2014, 12:17am

Thanks for the response Adrien. I'm excited to upgrade to 1.2, but it seems
strange to me that people refer to scan vs. scroll (you're not the first)
as scan is simply a search_type that - AFAIK - can be used for any type of
search (scroll or otherwise).

It just seems strange that setting the search_type=scan changes the
behavior of _search/scroll significantly (size=N semantics differ and no
first page is returned).

On Thu, May 22, 2014 at 5:09 PM, Adrien Grand <
adrien.grand@elasticsearch.com> wrote:

scan is mainly useful as a way to export data from the index. In the
context of a user interface, I think scroll would make more sense[1]. On a
side note, paging improved significantly for scroll requests in
Elasticsearch 1.2 (in both terms of speed and memory usage).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wed, May 21, 2014 at 8:28 PM, schmichael michael@lytics.io wrote:

On Wednesday, May 21, 2014 11:20:53 AM UTC-7, schmichael wrote:

...
2. search_type=scan doesn't seem to honor size=N

It seems I missed this in the guide:

"When scanning, the size is applied to each shard, so you will get back a
maximum of size * number_of_primary_shards documents in each batch."

Elasticsearch Platform — Find real-time answers at scale | Elastic

...but that only seems to be the case with the scan search_type. Do I
just have to divide the user's requested page size by my number of shards
(5 at this point)?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/--2SuAbGjkk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74g0Qqz0VLNDAHDEZ9N%3D2w723vEWaxqicfnQ_Cn1DSfA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74g0Qqz0VLNDAHDEZ9N%3D2w723vEWaxqicfnQ_Cn1DSfA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANRei7AS2acfapr2n6wEkgc8rASEdrUZ9gnKnk8z084AUfTo1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Scroll Search Bug? Elasticsearch	4	2587	July 6, 2017
Scroll search request returns documents but Scan does not Elasticsearch	1	741	July 6, 2017
SearchType SCAN and Size Elasticsearch	2	330	July 6, 2017
SCAN Search type behavior explanation Elasticsearch	1	337	July 6, 2017
Scan and Scroll in Elastic Search 2.4 Elasticsearch	1	766	September 28, 2017

_search/scroll?search_type=scan bugs/inconsistencies

Related topics