We all know that Elasticsearch is basically a NoSQL database. it means there is no relation between data and documents that need to be denormalized and flattened to index properly.
In spite of all these things, sometimes we need relations and fortunately, Elasticsearch understands these needs very well.
One-level objects can be denormalized quite easily. But things start to differ when dealing with an array of items.
Two strategies are suggested by elastic to handle this difficulty:
2- Nested documents
We are well aware that nested documents are faster than Parent-Child documents because the data for nested documents are stored locally in the same Lucene block, whereas for Parent-Child, the data is in the same Shard but may not always be in the same Lucene block.
But my question arises when it comes to updating the child document!
Elastic recommended choosing nested child when you have more Read requests than CUD and Parent/Child when there are too many CUD requests than Read requests.
But what does "too much CUD" actually mean? a document is updated once every quarter hour? every minute? or even multiple times in a second?
I have 20 million documents, and each document can have up to 200 nested documents (maximum).
and I must regularly update these nested documents. I, therefore, update at least 10 million papers every hour. Some of these occur every 15 minutes, while others occur every 1 to 12 hours.
On the other hand, I require complicated queries because I receive millions of requests to read documents (search or read).
What are your thoughts on this?
Do I have to select parent/child instead of nested? Due to the benefit, it offers to my queries and the performance it provides to my read requests, I prefer to use the nested approach.
However, I'm a little concerned about the updates and the possible stress they may put on the system.