I have a two node cluster with 600gb space between them on the data path. I'm setting a data stream where each index has 2 shards and one replica. Does the replica take same space as the main shards? Should I setup my data stream with this 600GB in mind or should I have to consider replica size also? If so, will replica take the same amount of space as main shard?
Yes, replica takes the same space as the primaries, you should always take it in consideration.
You should also take the watermark levels in consideration.
But one other thing is, if you have a 2 node cluster, you do not have a fault tolerant cluster as you cannot afford to lose the master node, so it does not make much sense to have replicas in this case.
Actually I have a voting only master node also setup. But thanks for pointing that out.
And one more thing Leandro. since I'm using data streams, will it reduce the size of the data in cold or warm phases? If yes, how much size will it reduce?
I do not use data streams, but if you are using the ILM, you can apply a force merge and compress the stored fields selecting the option.
Just keep in mind that the force merge could be resource intensive and the compression will also gives you a slower performance.