I am asked to index more than 3*10^12 documents in to elastic cluster, the cluster has 50 nodes with 40 cores, and 128G of memory.
I was able to do it with _bulk in python language (multi thread) but I could not reach more than 50,000 records per second for one node.
So I want to know:
- What is the fastest way to index data?
- As I know, I can index data to each data node, does it grow linear? I mean I can have 50,000 records for each node?