I am facing a technology stack decision for a upcoming feature, and I am wondering if Elasticsearch is the right choice.
My worst case scenario would include around 500 million parent documents with ~1,5 billion daily child documents added to the cluster (each including two big ints and a datetime). I would need to at least retain 30 days worth of data and there will will be heavy aggregations over children based on parent values (e.g. "give me all children that did x whose parent has been calculated by y since date z").
We currently have a smaller scale version of this system running on a cluster with 23 20-core machines dedicated to the index containing the data: With 75 shards and one replica we have 5.4 billion documents with a total size of 1.2 TB and it's performing... well... not as good as we hoped.
The queries are taking two to seven seconds - thanks to the parent-child joins - and denormalizing the data is no option as we are constantly getting around 2k docs per second which would result in a huge write amplification. So in its current state this will not scale ... at all. I'm quite fond about Elasticsearch and willing to push the limits, but apparently parent-child is not exactly a strong trait of ES (sadly!).
Does anyone have a recommendation which technology stack could be life safer for us? Maybe a PostgreSQL cluster would a better approach for this use case? Maybe Cloudspanner?
Any feedback/ideas welcome!