Scroll Search Bug?


(Mary Hamlin) #1

Hi. I'm very new to elasticsearch and have been playing around with the
scroll/scan search functionality attempting to get it to scroll through all
of the documents in my test index so that I can perform some processing on
them. I did the following:

//i did this the first time to initiate the search
http://myserver:9200/mary/_search?search_type=scan&scroll=1m&size=100

//then i used the scroll_id returned above to get the first set of data
http://myserver:9200/mary/_search?scroll=1m&size=100&scroll_id=c2Nhbjs1OzgwMzpnZEZCcW1pQ1EzLTdhTFYwNDFOZHdROzgwMTpnZEZCcW1pQ1EzLTdhTFYwNDFOZHdROzgwMDpnZEZCcW1pQ1EzLTdhTFYwNDFOZHdROzc5OTpnZEZCcW1pQ1EzLTdhTFYwNDFOZHdROzgwMjpnZEZCcW1pQ1EzLTdhTFYwNDFOZHdROzE7dG90YWxfaGl0czozOTcxOw==

//then i used the next scroll_id returned in the previous request to get
the next set of data
http://myserver:9200/mary/_search?scroll=1m&size=100&scroll_id=cXVlcnlUaGVuRmV0Y2g7NTs4MDg6Z2RGQnFtaUNRMy03YUxWMDQxTmR3UTs4MDY6Z2RGQnFtaUNRMy03YUxWMDQxTmR3UTs4MDU6Z2RGQnFtaUNRMy03YUxWMDQxTmR3UTs4MDQ6Z2RGQnFtaUNRMy03YUxWMDQxTmR3UTs4MDc6Z2RGQnFtaUNRMy03YUxWMDQxTmR3UTswOw==

Am I doing something wrong here? Note that I am getting different scroll
_id values returned for each request. However, the results that come back
are always the same, i.e. the scroll does not seem to be advancing, and it
would basically loop forever, since it's always returning the same 100
results over and over. Note that the index that I am attempting to return
results for has more than 100 records (it's something like 3000 atm), and I
have also played around with increasing the scroll time. Any help on this
would be greatly appreciated.

--


(Radu Gheorghe) #2

Hi Mary,

Here's how I do scrolling and it works (might not be the only way, though):

  1. get the scroll ID:
    curl -XGET
    localhost:9200/my_index/my_type/_search?search_type=scan&scroll=1m&size=100
  2. get the first 100 elements:
    curl -XGET localhost:9200/_search/scroll?scroll=1m&scroll_id=
    SCROLL_ID_FROM_STEP1_GOES_HERE
    3,4,5, etc. get the next 100 elements:
    curl -XGET localhost:9200/_search/scroll?scroll=1m&scroll_id=
    SCROLL_ID_FROM_STEP1_GOES_HERE

So you always give the scroll ID that you get at the first step.

When you get no more hits while repeating the last step you should be done.

On Friday, August 31, 2012 8:26:50 PM UTC+3, Mary Hamlin wrote:

Hi. I'm very new to elasticsearch and have been playing around with the
scroll/scan search functionality attempting to get it to scroll through all
of the documents in my test index so that I can perform some processing on
them. I did the following:

//i did this the first time to initiate the search
http://myserver:9200/mary/_search?search_type=scan&scroll=1m&size=100

//then i used the scroll_id returned above to get the first set of data

http://myserver:9200/mary/_search?scroll=1m&size=100&scroll_id=c2Nhbjs1OzgwMzpnZEZCcW1pQ1EzLTdhTFYwNDFOZHdROzgwMTpnZEZCcW1pQ1EzLTdhTFYwNDFOZHdROzgwMDpnZEZCcW1pQ1EzLTdhTFYwNDFOZHdROzc5OTpnZEZCcW1pQ1EzLTdhTFYwNDFOZHdROzgwMjpnZEZCcW1pQ1EzLTdhTFYwNDFOZHdROzE7dG90YWxfaGl0czozOTcxOw==

//then i used the next scroll_id returned in the previous request to get
the next set of data

http://myserver:9200/mary/_search?scroll=1m&size=100&scroll_id=cXVlcnlUaGVuRmV0Y2g7NTs4MDg6Z2RGQnFtaUNRMy03YUxWMDQxTmR3UTs4MDY6Z2RGQnFtaUNRMy03YUxWMDQxTmR3UTs4MDU6Z2RGQnFtaUNRMy03YUxWMDQxTmR3UTs4MDQ6Z2RGQnFtaUNRMy03YUxWMDQxTmR3UTs4MDc6Z2RGQnFtaUNRMy03YUxWMDQxTmR3UTswOw==

Am I doing something wrong here? Note that I am getting different scroll
_id values returned for each request. However, the results that come back
are always the same, i.e. the scroll does not seem to be advancing, and it
would basically loop forever, since it's always returning the same 100
results over and over. Note that the index that I am attempting to return
results for has more than 100 records (it's something like 3000 atm), and I
have also played around with increasing the scroll time. Any help on this
would be greatly appreciated.

--


(Clinton Gormley) #3

Hi Mary

The important thing to note from Radu's reply is that the first request
is against /_search and all subsequent requests against /_search/scroll

clint

--


(Mary Hamlin) #4

Thanks Radu/Clinton. I finally got it working. I think the issue for me was that I was still using the index/type name in the url after the initial scan, which was resulting in a 404 error message and a return message of "{"_index":"mary","_type":"_search","_id":"scroll","exists":false}", even though I used the correct api syntax of /_search followed by subsequent /_search/scroll commands, which confused me and made me think that I had the api syntax wrong. So I guess other than using /_search followed by /_search/scroll, one also needs to make sure to remove any index/type names from the subsequent search urls. Thanks so much for the help!


(system) #5