Hello! That's my first post here in elastic discussion area, sorry if any mistake were made.
I'm kinda new using ELK Stack, I'm trying to parse a .CSV file to ElasticSearch using the Python API. The thing is, it's taking way too long to parse just a few logs (312seconds to parse 30000 logs). I'm using ElasticSearch 5-6-3 running on Ubuntu 16.04, 6GB RAM
The main idea is to convert each row into a json and then parse it to Elastic, the code:
import time
import json
import pandas as pd
from elasticsearch import Elasticsearch
class Storage(object):
def __init__(self,user,password):
self.user = user
self.password = password
self.es = Elasticsearch(http_auth=(user, password))
def get_info(self, log=False):
info = self.es.info()
if info:
print(json.dumps(info, indent=3))
return info
def index(self, index, doc_type, _id, json):
status = self.es.index(index=index, doc_type=doc_type, id = _id, body=json)
return status
if __name__ == '__main__':
st = Storage('elastic','changeme')
print(st.get_info())
df = pd.read_csv(file_to_parse, low_memory=False)
aux = df.to_dict('records')
index = 1
begin = time.time()
for register in aux:
reg = json.dumps(register)
res = st.index('someindex','log',index,reg)
index += 1
print("Finished", time.time() - begin)
What could possibly be improved here?