Csv parse taking too long using python API

henriqueluz · November 8, 2017, 5:13pm

Hello! That's my first post here in elastic discussion area, sorry if any mistake were made.

I'm kinda new using ELK Stack, I'm trying to parse a .CSV file to ElasticSearch using the Python API. The thing is, it's taking way too long to parse just a few logs (312seconds to parse 30000 logs). I'm using ElasticSearch 5-6-3 running on Ubuntu 16.04, 6GB RAM
The main idea is to convert each row into a json and then parse it to Elastic, the code:

import time
import json
import pandas as pd
from elasticsearch import Elasticsearch

class Storage(object):
    
    def __init__(self,user,password):
        
        self.user = user
        self.password = password
        self.es = Elasticsearch(http_auth=(user, password))
        
    def get_info(self, log=False):
        
        info = self.es.info()
        if info:
            print(json.dumps(info, indent=3))
            return info
    
    def index(self, index, doc_type, _id, json):
    
        status = self.es.index(index=index, doc_type=doc_type, id = _id, body=json)
        return status

        
if __name__ == '__main__':
    
    st = Storage('elastic','changeme')
    print(st.get_info())
    df  = pd.read_csv(file_to_parse, low_memory=False)
    aux = df.to_dict('records')
    
    index = 1
    begin = time.time()
    for register in aux:
        
        reg = json.dumps(register)
        res = st.index('someindex','log',index,reg)
        index += 1
        
    print("Finished", time.time() - begin)

What could possibly be improved here?

Christian_Dahlqvist · November 8, 2017, 5:19pm

Use the bulk API to send multiple documents per indexing request (docs). This is much more efficient than indexing each document individually.

system · December 6, 2017, 5:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to speed up indexing by using Python API Elasticsearch	3	1656	July 6, 2017
Index a large dataset into elasticsearch Elasticsearch	3	1098	July 5, 2017
Logstash parse too slow to elasticsearch Logstash	9	2327	March 2, 2018
Is there any package available to write the elasticsearch documents in bulk (more than 2 million) to a csv? Elasticsearch	2	429	July 5, 2017
Best method - Importing 50x10gb CSV files into Elasticsearch on GCE Elasticsearch	6	8950	July 6, 2017

Csv parse taking too long using python API

Related topics