Elasticsearch tuning

We are developing a platform where millions of records update or insert on a daily basis.
For developing this platform we are working with MongoDB as primary database and Elasticsearch for secondary database(Only for searching)

Database
There are 2 main MongoDB collection that we need to import on Elasticsearch also

  1. Companies- This collection have total 20+ fields
  2. People- In this collection we have 15+ fields. Every person belongs to a company._id. So here is one to many relation between company and person collection

Elasticsearch Indexes

There are 2 Elasticsearch index that we are using for filter the data

  1. Compay_index: Here we are storing all the company data from company collection of MongoDB
  2. People_index: We are storing the denormalized data here. Company and user information are combined and stored in this index. Because we need to apply the sorting on all columns and pagination.

We are expecting at least 50Millions records in our database

**We wrote our own pipeline to sync the data from mongoDB to Elasticsearch

Questions:

  1. We are facing JVM memory utilization 98% and getting circut_breaking_exception. So please help us to tune our Elasticsearch cluster.
  2. Is that the correct way to store the data in a denormalized way in Elasticsearch index where one company can have max 2L records

You need to share information about the specs of your cluster so people can try to help you, how many nodes, how many index, how many shards per index etc.

Also, if you can share examples of your documents it would help.

Since you are combining the company and user information in one index, why do you need the Company_Index?

I would say that it is better to have everything in one index, for every entry in the people index you would also add the fields company related to the entry, but I understand that this is what you are already doing with the People_Index, right?