Hello all.
I have to parse a document which has more than 10000 hits. The natural choice was to opt for scroll api.
I have read the documentation from elastic and have done the following:
import requests
resp=requests.post('http://localhost:9200/netflow*/_search?pretty=true&size=100&scroll=5m ')
This gave me a scroll id.
I have stored that scrollID in a variable and did the following:
SearchExp="http://localhost:9200/_search/scroll?pretty=true&scroll=5m&scroll_id= "+ScrollID
response = requests.post(SearchExp)
However, everytime I run the program, I get the same 100 results [since size=100].
What should I do to get the next set of results and read the full document above 10000 ??
dadoonet
(David Pilato)
June 15, 2018, 11:31am
2
I never tried to pass those parameters as query params. I'm unsure if it's supposed to work.
The documentation says:
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}
Could you try it that way instead?
If it does not work please share all details (responses and requests included).
BoffinPanda
(Juggernaut Panda)
June 15, 2018, 12:58pm
3
Hello. It does work this way but I have to use a request i.e. the url way.
The reason is the scroll_id generated is so long that it doesnt support to be fit inside HTTP post method
dadoonet
(David Pilato)
June 15, 2018, 1:15pm
4
BoffinPanda:
It does work this way
Great. So documentation says it all IMO.
I believe this has been removed or was not supported. I did not check the code.
May be the parameter name is a bit different? Like _scroll_id
instead?
The reason is the scroll_id generated is so long that it doesnt support to be fit inside HTTP post method
I would expect the opposite as the length of a POST with body has no limit (or at least super high limit) but the URL length has a lower limit for sure.
Yes. I was confused.
However, I fixed my issue as follows:
import requests,re,string,json
def main(args):
resp=requests.post('http://localhost:9200/netflow-2018.02.20/_search? pretty=true&size=100&scroll=5m')
resp =json.loads(resp.content)
#print (resp)
sid = resp['_scroll_id']
print (sid)
while(True): # continue this loop until hits become zero
headers = {
'Content-Type': 'application/json',
}
data = '\n{\n "scroll" : "1m", \n "scroll_id" : "'+sid+'" \n}'
response = requests.post('http://localhost:9200/_search/scroll', headers=headers, data=data)
response =json.loads(response.content)
if not (response['hits']['hits']):
break;
return 0
if name == 'main ':
import sys
sys.exit(main(sys.argv))
1 Like
system
(system)
Closed
July 13, 2018, 2:26pm
6
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.