Elasticsearch tuning

Vishal_Tomar · October 13, 2021, 12:42pm

We are developing a platform where millions of records update or insert on a daily basis.
For developing this platform we are working with MongoDB as primary database and Elasticsearch for secondary database(Only for searching)

Database
There are 2 main MongoDB collection that we need to import on Elasticsearch also

Companies- This collection have total 20+ fields
People- In this collection we have 15+ fields. Every person belongs to a company._id. So here is one to many relation between company and person collection

Elasticsearch Indexes

There are 2 Elasticsearch index that we are using for filter the data

Compay_index: Here we are storing all the company data from company collection of MongoDB
People_index: We are storing the denormalized data here. Company and user information are combined and stored in this index. Because we need to apply the sorting on all columns and pagination.

We are expecting at least 50Millions records in our database

**We wrote our own pipeline to sync the data from mongoDB to Elasticsearch

Questions:

We are facing JVM memory utilization 98% and getting circut_breaking_exception. So please help us to tune our Elasticsearch cluster.
Is that the correct way to store the data in a denormalized way in Elasticsearch index where one company can have max 2L records

leandrojmp · October 13, 2021, 1:14pm

You need to share information about the specs of your cluster so people can try to help you, how many nodes, how many index, how many shards per index etc.

Also, if you can share examples of your documents it would help.

Since you are combining the company and user information in one index, why do you need the Company_Index?

I would say that it is better to have everything in one index, for every entry in the people index you would also add the fields company related to the entry, but I understand that this is what you are already doing with the People_Index, right?

Vishal_Tomar · October 22, 2021, 11:28am

Here we are going to create the custom cluster using AWS EC2 instance because Elastic cloud cost is very high from our budget.
We are expecting at least 200Gb data that is frequently updating and inserting new records.

We stored company data in company_index to avoid so many aggregation as we are displaying only company data on separate page also
So now my questions are-

How many nodes do we need in our custom cluster?
What will be the best configurations for those nodes to handle 200GB data?
As you said that it will be good to store all the data in one index. Is that feasible where we are frequently updating the data? Let's suppose a company has 1L users and we update the company info then we also need to update all the company info in 1L user records.

system · November 19, 2021, 11:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting Data To Persist Elasticsearch	9	365	July 6, 2017
Usage in production Elasticsearch	7	383	July 6, 2017
Need some help / idea about architecture Elasticsearch	4	390	July 6, 2017
Adding millions of documents, performance decay Elasticsearch	6	677	July 6, 2017
Scaling an Elasticsearch cluster to >100 billion documents, 25k/sec indexing rate Elasticsearch	2	1274	July 6, 2017

Elasticsearch tuning

Related topics