Importing Big Data in Elasticsearch


(Sanjay Bhosale) #1

Hi,

I Have 3 node Elasticsearch cluster and around 1TB data with 16 indexes and many ids. While parsing document ids may get repeated and i need to update old id by concatenating new values. I have tried importing information with single cluster and for small amount of data(approx 10GB) it took 3 weeks as their were around 30lakh ids in it. Now i want to have my complete set of data on Elasticsearch is their other way to import data? Is their any location where i can keep all data and it will be directly imported?


(Magnus B├Ąck) #2

So you can't just do bulk insertions but need to look up whether a already document exists and if so update it? Even so, three weeks for 3M documents and 10 GB data sounds unreasonably slow. It's about two documents per second.

If you can describe the problem is a bit more detail and outline what your code looks like it'll be easier to give specific advice. You'll probably want to look into partial updates.

(Keep in mind that "lakh" isn't well known outside of India and neighboring countries.)


(system) #3