Response 400 while using Scroll API with Python - scroll_id too long

benecrom · July 17, 2019, 2:31pm

Hi all,

I'm trying to leverage on the scroll API from my python application.
My python code looks like this:

es = Elasticsearch([{'host': 'elastic', 'port': '9999'}])

data = es.search(index = "index*", 
				  body = query,
				  scroll = "2m",
				  size = 1000,
				  request_timeout = 3000)

sid = data['_scroll_id']

# do something w/ current batch of hits (data) ...

while True:
	logging.info(sid)
	data = es.scroll(scroll_id=sid, scroll='2m')
	
	scroll_size = len(data['hits']['hits'])
	
	# If no data was collected stop execution
	if scroll_size == 0:
		break
	else:
		# do something w/ current batch of hits (data) ...
		
		# Update the scroll ID
		sid = data['_scroll_id']
		if batch_n % 1000 == 0:
			logging.info("Batch: {}".format(batch_n))

The query I run is as follow:

def prep_query(start_date, start_hour, end_date, end_hour):
    return """{{
      "size": 0,
      "query": {{
        "bool": {{
          "must": [
            {{
              "match_all": {{}}
            }},
            {{
              "match_phrase": {{
                "event": {{
                  "query": "STREAM"
                }}
              }}
            }},
            {{
              "range": {{
                "ts": {{
                  "gte": "{0} {1}",
                  "lte": "{2} {3}",
                  "format": "yyyy-MM-dd HH:mm:ss.SSS"
                }}
              }}
            }}
          ],
          "must_not": []
        }}
      }},
      "_source": {{
        "excludes": []
      }}
    }}""".format(start_date, start_hour, end_date, end_hour)

I'm getting an error out of this and I'm not sure how to solve it. Please also assume that I can't edit elasticsearch conf.
Here's the error:

[2019-07-17 16:08:58,857] {logging_mixin.py:95} INFO - [2019-07-17 16:08:58,856] {base.py:146} WARNING - GET http://elastic01.skytech.local:9200/_search/scroll?scroll=2m&scroll_id=DnF1ZXJ5VGhlb ..(very long hash).. Zad0E%3D [status:400 request:0.002s]
[2019-07-17 16:08:58,857] {__init__.py:1580} ERROR - RequestError(400, 'too_long_frame_exception', 'An HTTP line is larger than 4096 bytes.')

Any help on this? much appreciated! Thanks a lot guys!

system · August 14, 2019, 2:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help needed with Proper usage of scroll api using python: Getting the same results Elasticsearch	5	2135	July 13, 2018
Scroll in ElasticSearch Aggregation Elasticsearch	7	10659	December 27, 2019
[SOLVED ]Elasticsearch-py scroll-id http.client.BadStatusLine: '' Elasticsearch	3	2090	July 5, 2017
Scroll id is not changing while querying Elasticsearch	2	4918	December 8, 2017
Scroll in Python Elasticsearch client is not working Elasticsearch language-clients	1	352	November 16, 2022

Response 400 while using Scroll API with Python - scroll_id too long

Related topics