We are developing a platform where millions of records update or insert on a daily basis.
For developing this platform we are working with MongoDB as primary database and Elasticsearch for secondary database(Only for searching)
There are 2 main MongoDB collection that we need to import on Elasticsearch also
- Companies- This collection have total 20+ fields
- People- In this collection we have 15+ fields. Every person belongs to a company._id. So here is one to many relation between company and person collection
There are 2 Elasticsearch index that we are using for filter the data
- Compay_index: Here we are storing all the company data from company collection of MongoDB
- People_index: We are storing the denormalized data here. Company and user information are combined and stored in this index. Because we need to apply the sorting on all columns and pagination.
We are expecting at least 50Millions records in our database
**We wrote our own pipeline to sync the data from mongoDB to Elasticsearch
- We are facing JVM memory utilization 98% and getting circut_breaking_exception. So please help us to tune our Elasticsearch cluster.
- Is that the correct way to store the data in a denormalized way in Elasticsearch index where one company can have max 2L records