Handling TB's of data in Elasticsearch

Hi all,
In my case, I need to handle 3TB of data in elaticsearch, We have a job to rotate the index daily once. Each day a new index will be created, which is of size 100 GB. For 30 days we have 30 indices( 30 * 100GB =3000GB). we need to create monthly reports on this using aggregation queries(mostly terms aggs). But running aggregation queries on 3 TB of data, leads to client node crash. Could anyone help me with this?

Cluster details:
3 master nodes: 500 vcpu and 1GB physical memory
5 data nodes: 1vcpu and 1GB physical memory
1 client node: 500vcpu and 1GB physical memory

what could be the best solution for this?

That sounds like very limited resources in terms of CPU, memory and heap for that data size, especially on the data nodes. I am not surprised you are having issues aggregating over the data set. I would recommend gradually increasing resources until the nodes fit your workload.

Thanks @Christian_Dahlqvist for your reply. what is your recommendation for this use case?

I do not know what will be required, do would recommend gradually increasing until performance is sufficient. I would probably start with 16GB RAM and 8GB heap for the data nodes and a bit less for the coordinating only node.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.