Handling TB's of data in Elasticsearch

Jayabal_K · September 22, 2020, 6:31am

Hi all,
In my case, I need to handle 3TB of data in elaticsearch, We have a job to rotate the index daily once. Each day a new index will be created, which is of size 100 GB. For 30 days we have 30 indices( 30 * 100GB =3000GB). we need to create monthly reports on this using aggregation queries(mostly terms aggs). But running aggregation queries on 3 TB of data, leads to client node crash. Could anyone help me with this?

Cluster details:
3 master nodes: 500 vcpu and 1GB physical memory
5 data nodes: 1vcpu and 1GB physical memory
1 client node: 500vcpu and 1GB physical memory

what could be the best solution for this?

Christian_Dahlqvist · September 22, 2020, 9:11am

That sounds like very limited resources in terms of CPU, memory and heap for that data size, especially on the data nodes. I am not surprised you are having issues aggregating over the data set. I would recommend gradually increasing resources until the nodes fit your workload.

Jayabal_K · September 22, 2020, 12:15pm

Thanks @Christian_Dahlqvist for your reply. what is your recommendation for this use case?

Christian_Dahlqvist · September 22, 2020, 12:19pm

I do not know what will be required, do would recommend gradually increasing until performance is sufficient. I would probably start with 16GB RAM and 8GB heap for the data nodes and a bit less for the coordinating only node.

system · October 20, 2020, 12:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.