THis is very simple question but hard to find answer anywere with Elasticsearch documentations.
We are using Elasticsearch platinum license users with cloud. We have opt to deploy our services on AWS. We would like to make a search engine where we have 50 million rows of data with 10 coloumns each. 99% of our data is dynamic and we update it once a week for new indexing (not all few changes) .
So my questions are
As to our understanding the CPU and Ram Elasticsearch ## Hot data and Content tier Applies for indeing part only . So we have less frequent data ingesting thus we can go with lower ram and CPU. Is it correct ?
Secondly we are using Elastic App search for our search so higher the CPU better the search query?
So what is the role of Zones? (Apart from snapshot or copy of data)
If we have adopted for 3 zones in Elasticapp app search it shows that we have 6 GB RAM Up to 25.5 vCPU. Does our queries are distributed among all the three zones ? Do we actually get power of 25.5Cpu's?
And the last question is do we need to add Coordinating instances to make use of all the CPU power?
You might have got the Idea where we are heading to. We would like to optimize our resources as much as we can and get best out of it.
Hi @Sidhiksha_Sharda Welcome to the community and thanks for Trying Elastic App Search on Elastic Cloud.
Great question and in short, you should always test your configurations (as I am sure you know)
I will try to answer these question but the actual best cost / performance depends on many things.
The Hot tier is definitely used for indexing and for your use case that is where the queries will be performed as well. This is common for "Search" use cases. So both Ingest AND Query in the Hot Tier.
The how much RAM and CPU is highly depended on your use case, I suspect your use case the actual storage size of the data is pretty small so I would think you would want to focus on CPU optimized Profiles.
Yes... App Search is really the App that sits on top of the actual data nodes... typically Search use cases are CPU driven.
2 or More Zone
a) Provide High Availability, If 1 node crashes, elasticsearch / your search use case will continue to funct.
b) Yes (but it is a little more complicated) But yes your search may / will be distributer across multiple nodes assuming you have a replica.
I suspect you are looking at the AWS CPU optimized ARM... Good Choice!
Look like you are looking at 1 x 3GB x 3 zones = 6GB
I actually would like 2 Slightly Larger nodes than 3 smaller...
You might consider 1 x 4GB x 2 zone just a thought... either would probably be fine.
Be clear that is UP TO 25.5 vCPU you are not guaranteed the CPUs until larger sized nodes, but the larger the node the more of that CPU slice you should get.
No I would not expect that you would need coordinating nodes.
Questions for you:
What do expect the queries / sec to be?
How complex are you queries? Simple Term, Text + Term etc.?
What Cloud Provider did you Choose and What Profile?
What is the total storage size on the primary copy of your data?
The queries we are getting currently is very a lot from 300 ms to 1.2 Sec which the main concern for us why it is acting weirdly.
How complex are you queries? Simple Term, Text + Term etc.?
The queries are very simple we have two column which are used for search. Name , Address so basically it is a mapping application and we have created a geocoding engine. So user could search a combination of
Pizza locality
pizaa city
pizza zip code.
What Cloud Provider did you Choose and What Profile?
We have chose AWS with
Total (size x zone)
Hot data and Content tier
120 GB storage 4 GB RAM Up to 2.2 vCPU
Enterprise Search instances
Total 3 zones (size x zone)
6 GB RAM Up to 25.5 vCPU
What is the total storage size on the primary copy of your data?
India is huge so the data could go upto 50 GB
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.