How to load 2 Million record of CSV file's data to Elasticsearch in CentosOs 7?


(Anjitha) #1

What is the best way to load 2 million records of CSV file data to elastic search?

Can I use spark for this purpose ?


(Jason Wee) #2

maybe csv to json and then stream to es? https://github.com/elastic/stream2es


(David Pilato) #3

Have a look also at http://david.pilato.fr/blog/2015/04/28/exploring-capitaine-train-dataset/


(Anjitha) #4

Thank you but this solution take ages to run million records. However thank you at least its working for small amount of data


(Anjitha) #5

@ Jason Thank you. I couldn't make it work. Spark also has streaming ...
https://spark.apache.org/docs/latest/streaming-programming-guide.html

will this work for me in large amount of data ?


(David Pilato) #6

You can increase the number of workers of Logstash and set it to the number of CPU you have on your machine.


(Jason Wee) #7

that's sad... or you can pay someone to do it for you :slight_smile:


(system) #8