I am trying to get data from elasticsearch using elasticsearch-dsl python library. I need to get all the data for last 15 min. Issue is retrieving data is extremely slow. It takes lot of time for 2.2 million hits. Here is my code
start_time = time.time()
try:
client = Elasticsearch(['IP_HERE'])
s = Search(using=client, index="firewallv2-*", doc_type = 'doc').filter('range', **{'@timestamp': {'gte': 'now-15m' , 'lt': 'now'}})
response = s.execute()
except Exception as e:
print(e)
print("error in getting data from FIREWALL")
try:
for hit1 in s.scan():
source_ip.append(hit1.to_dict().get('Source IP'))
destination_ip.append(hit1.to_dict().get('Destination IP'))
destination_port.append(hit1.to_dict().get('Destination Port'))
source_port.append(hit1.to_dict().get('Source Port'))
except Exception as e:
print("not able to parse json data")
elapsed_time = time.time() - start_time
print("Time to get data from server " + str(elapsed_time))
There is more to code but I am just posting the main slow component. Rest is pure python code. Below is my output
Time to get data from server 893.599892855
Time to store data into variables 27.647258997
Time to process for loop 9.32531404495
All time output is in seconds and you can see that it takes huge amount of time to retrieve 2.2 million hits.
I also tried using bulk_size=10000
and even changing builk_size to various values but not success.