Help needed with Proper usage of scroll api using python: Getting the same results

BoffinPanda · June 15, 2018, 9:15am

Hello all.

I have to parse a document which has more than 10000 hits. The natural choice was to opt for scroll api.

I have read the documentation from elastic and have done the following:

import requests

resp=requests.post('http://localhost:9200/netflow*/_search?pretty=true&size=100&scroll=5m')

This gave me a scroll id.

I have stored that scrollID in a variable and did the following:

SearchExp="http://localhost:9200/_search/scroll?pretty=true&scroll=5m&scroll_id="+ScrollID
response = requests.post(SearchExp)

However, everytime I run the program, I get the same 100 results [since size=100].

What should I do to get the next set of results and read the full document above 10000 ??

dadoonet · June 15, 2018, 11:31am

I never tried to pass those parameters as query params. I'm unsure if it's supposed to work.

The documentation says:

POST  /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" 
}

Could you try it that way instead?

If it does not work please share all details (responses and requests included).

BoffinPanda · June 15, 2018, 12:58pm

Hello. It does work this way but I have to use a request i.e. the url way.

The reason is the scroll_id generated is so long that it doesnt support to be fit inside HTTP post method

dadoonet · June 15, 2018, 1:15pm

Great. So documentation says it all IMO.
I believe this has been removed or was not supported. I did not check the code.
May be the parameter name is a bit different? Like _scroll_id instead?

The reason is the scroll_id generated is so long that it doesnt support to be fit inside HTTP post method

I would expect the opposite as the length of a POST with body has no limit (or at least super high limit) but the URL length has a lower limit for sure.

BoffinPanda · June 15, 2018, 2:26pm

Yes. I was confused.

However, I fixed my issue as follows:

import requests,re,string,json

def main(args):

resp=requests.post('http://localhost:9200/netflow-2018.02.20/_search? pretty=true&size=100&scroll=5m')

resp  =json.loads(resp.content)
#print (resp)
sid = resp['_scroll_id']
print (sid)
while(True): # continue this loop until hits become zero
	headers = {
	'Content-Type': 'application/json',
	}

	data = '\n{\n    "scroll" : "1m", \n    "scroll_id" : "'+sid+'" \n}'

	response = requests.post('http://localhost:9200/_search/scroll', headers=headers, data=data)
	response  =json.loads(response.content)
	if not (response['hits']['hits']):
		break;
return 0

if name == 'main':
import sys
sys.exit(main(sys.argv))

system · July 13, 2018, 2:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Response 400 while using Scroll API with Python - scroll_id too long Elasticsearch	1	1455	August 14, 2019
Python Search POST request Elasticsearch	1	1500	November 20, 2018
Confused about why scroll api doesn't seem to function Elasticsearch	7	843	July 5, 2017
Scroll in ElasticSearch Aggregation Elasticsearch	7	10804	December 27, 2019
Specifying size in Scroll API Elasticsearch	5	440	June 2, 2020

Help needed with Proper usage of scroll api using python: Getting the same results

Related topics