Hi all,
I'm trying to leverage on the scroll API from my python application.
My python code looks like this:
es = Elasticsearch([{'host': 'elastic', 'port': '9999'}])
data = es.search(index = "index*",
body = query,
scroll = "2m",
size = 1000,
request_timeout = 3000)
sid = data['_scroll_id']
# do something w/ current batch of hits (data) ...
while True:
logging.info(sid)
data = es.scroll(scroll_id=sid, scroll='2m')
scroll_size = len(data['hits']['hits'])
# If no data was collected stop execution
if scroll_size == 0:
break
else:
# do something w/ current batch of hits (data) ...
# Update the scroll ID
sid = data['_scroll_id']
if batch_n % 1000 == 0:
logging.info("Batch: {}".format(batch_n))
The query I run is as follow:
def prep_query(start_date, start_hour, end_date, end_hour):
return """{{
"size": 0,
"query": {{
"bool": {{
"must": [
{{
"match_all": {{}}
}},
{{
"match_phrase": {{
"event": {{
"query": "STREAM"
}}
}}
}},
{{
"range": {{
"ts": {{
"gte": "{0} {1}",
"lte": "{2} {3}",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
}}
}}
}}
],
"must_not": []
}}
}},
"_source": {{
"excludes": []
}}
}}""".format(start_date, start_hour, end_date, end_hour)
I'm getting an error out of this and I'm not sure how to solve it. Please also assume that I can't edit elasticsearch conf.
Here's the error:
[2019-07-17 16:08:58,857] {logging_mixin.py:95} INFO - [2019-07-17 16:08:58,856] {base.py:146} WARNING - GET http://elastic01.skytech.local:9200/_search/scroll?scroll=2m&scroll_id=DnF1ZXJ5VGhlb ..(very long hash).. Zad0E%3D [status:400 request:0.002s]
[2019-07-17 16:08:58,857] {__init__.py:1580} ERROR - RequestError(400, 'too_long_frame_exception', 'An HTTP line is larger than 4096 bytes.')
Any help on this? much appreciated! Thanks a lot guys!