Hello all.
I have to parse a document which has more than 10000 hits. The natural choice was to opt for scroll api.
I have read the documentation from elastic and have done the following:
import requests
resp=requests.post('http://localhost:9200/netflow*/_search?pretty=true&size=100&scroll=5m ')
This gave me a scroll id.
I have stored that scrollID in a variable and did the following:
SearchExp="http://localhost:9200/_search/scroll?pretty=true&scroll=5m&scroll_id= "+ScrollID 
response = requests.post(SearchExp)
However, everytime I run the program, I get the same 100 results [since size=100].
What should I do to get the next set of results and read the full document above 10000 ??
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    June 15, 2018, 11:31am
                   
                   
              2 
               
             
            
              I never tried to pass those parameters as query params. I'm unsure if it's supposed to work.
The documentation  says:
POST  /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" 
}
 
Could you try it that way instead?
If it does not work please share all details (responses and requests included).
             
            
               
               
               
            
            
           
          
            
              
                BoffinPanda  
                (Juggernaut Panda)
               
              
                  
                    June 15, 2018, 12:58pm
                   
                   
              3 
               
             
            
              Hello. It does work this way but I have to use a request i.e. the url way.
The reason is the scroll_id generated is so long that it doesnt support to be fit inside HTTP post method
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    June 15, 2018,  1:15pm
                   
                   
              4 
               
             
            
              
 BoffinPanda:
 
It does work this way
 
 
Great. So documentation says it all IMO. 
I believe this has been removed or was not supported. I did not check the code. 
May be the parameter name is a bit different? Like _scroll_id instead?
The reason is the scroll_id generated is so long that it doesnt support to be fit inside HTTP post method
 
I would expect the opposite as the length of a POST with body has no limit (or at least super high limit) but the URL length has a lower limit for sure.
             
            
               
               
               
            
            
           
          
            
            
              Yes. I was confused.
However, I fixed my issue as follows:
import requests,re,string,json
def main(args):
resp=requests.post('http://localhost:9200/netflow-2018.02.20/_search? pretty=true&size=100&scroll=5m')
resp  =json.loads(resp.content)
#print (resp)
sid = resp['_scroll_id']
print (sid)
while(True): # continue this loop until hits become zero
	headers = {
	'Content-Type': 'application/json',
	}
	data = '\n{\n    "scroll" : "1m", \n    "scroll_id" : "'+sid+'" \n}'
	response = requests.post('http://localhost:9200/_search/scroll', headers=headers, data=data)
	response  =json.loads(response.content)
	if not (response['hits']['hits']):
		break;
return 0
 
if name  == 'main ': 
import sys 
sys.exit(main(sys.argv))
             
            
               
               
              1 Like 
            
            
           
          
            
              
                system  
                (system)
                  Closed 
               
              
                  
                    July 13, 2018,  2:26pm
                   
                   
              6 
               
             
            
              This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.