Upload Billions data to ES in less time

Suppose I have 2 billion data in another database i.e. MySQL. I want to upload this amount of data to ES. The calculation is showing almost 10 days it will take to upload. But I need to do this task within a few hours.
is it even possible?

Help is always appreciated :slight_smile:

How long does it take to get all the data out of MySQL into e.g. a file?

Thanks for replying!

File we can create. We are not bothered about that how long it will take to write file from MySQL. That we allow. My issue is the next step which is to upload 2 billion data to ES from a file in few hours only. You can assume I have one or more ready file with data.

How large are the documents? How complex are they? Have you defined and optimized the mappings?

What is the calculation you are talking about? 2 bllion events in 10 days is just a bit over 2k events per second. That sounds quite low unless the documents are quite large and complex.

The indexing throughput also depends on the specification and size of your cluster. Do you have any details on this?

I would expect extracting 2 billion events from MySQL would take a reasonably long time.
If it potentially takes a lot longer than a few hours to extract the data from MySQL, why does it have to be uploaded to Elasticsearch in a shorter timeframe than that? Can you elaborate a bit more on the use case and requirements?

2 Likes

I appreciate your help!!

sharing with you the current situation. first of all our team is still discussing this. we don't have any ES expert but the client wants user data on ES. currently, all user data are on MySQL. we need to move these data to ES. please forget about calculation,10 days, file everything. please give us any direction that I should follow to solve this issue (MySQL to ES)in a shorter time span. And data is not complex. simple user name,phone,email,address.

so my fresh question is how to move a large no of data from one DB(MySQL) to ES in less time? data may or may not be complex (in my case data is not complex).

Thanks in advance :slight_smile:

What I think @Christian_Dahlqvist is saying is that you need to provide some evidence that the performance bottleneck is related to Elasticsearch at all. It's entirely possible that the limiting factor is the rate you can extract data from MySQL, and you don't need to change anything on the Elasticsearch side to hit your performance target.

The nightly benchmarks show that it is possible to index ≥100,000 documents per second into a well-tuned 3-node Elasticsearch cluster. At that rate, it'd take ~6h to index 2 billion documents. Of course your documents may be more complicated than the ones in this test, or your cluster may be smaller or less well-tuned, so it's not a fair comparison, but that hopefully gives you a rough idea of where to look.

3 Likes

thanks for the info. I will test this now and will reply ASAP
:slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.