Urgent: Looking for advice on selecting proper Elasticsearch service type with OPTIMIZED Configuration

We have been trying to move to Elasticsearch implementation from core Lucene and have stumbled into a couple of situations. Before asking questions relating to our complex situations, we would like to share detail information of our Current System-Data-Infrastructure and Our Main Target as below:

Our System-Data-Infrastructure
a. Currently, we are hosting the Lucene 3.1 on single Standalone Server without replication
b. Data-Size: 5 TB minimum to 15 TB maximum or even increasing
c. Total Doc Count: Atleast 50 Million and increasing
d. New Doc Indexing/Updating Time Interval: Daily
e. Full Re-indexing Time Interval: Once a month
f. Planned Hosting Region: US-East (Ohio)
g Usage Up/Down Time: 24hrs Up
h. Traffic Hits: Moderately high

Our Main Target:

  1. Migrate to Elasticsearch with replication
  2. We have built our own custom plugin which needs to be integrated that works based on Lucene/Elastic Queries

Our Questions relating to our Complex Situations:

  1. If we move to Elasticsearch, what could be a possible best infrastructure configuration i.e.
    A. How many Master Nodes and Data Nodes combination will be required to hold the above data-size in an optimized way without compromising the search speed?
    B. What is the best-optimized Instance Type (with Best No. or Cores/Memory size) to be used for both Master Nodes and Data Nodes to determine?
    C. Does implementing more no. of Nodes actually mean load-balancing the Elasticsearch or is there other way for load-balancing the Elasticsearch?

  2. We have found out that we can implement Elasticsearch through the following vendors:
    A. Elasticsearch Service by Elastic Cloud
    a. If we use this, can we install our custom plugin and how? Also, whether we “own” the installation, or it is perpetually under Elastic Cloud's control?
    b. Do Elastic Cloud solutions include load balancing for the Elasticsearch automatically (or the load balancing solely based on the no. of Nodes)?
    c. If we deploy our application on the Cloud (for eg.: Google Cloud or AWS or Azure), how can we manage the VPC and its security while connecting with the Elasticsearch by Elastic Cloud?
    d. If we deploy our application on the Cloud (for eg.: Google Cloud or AWS or Azure), won't there be a charge of data transfer by these Cloud services?
    e. Is there any package plan for One-Year or 3-Year Term contract?
    f. What is the level of Technical Support provided by Elastic Cloud?
    B. Elasticsearch Service by Elastic Cloud on the AWS Marketplace
    a. If we use from the Marketplace, can we install our custom plugin and how? Also, whether we “own” the installation, or it is perpetually under Elastic Cloud's control?
    b. Do Elastic Cloud solutions include load balancing for the Elasticsearch automatically (or the load balancing solely based on the no. of Nodes)?
    c. If we deploy our application on the Cloud (for eg.: Google Cloud or AWS or Azure), how can we manage the VPC and its security while connecting with the Elasticsearch by Elastic Cloud from the Marketplace?
    d. If we deploy our application on the Cloud (for eg.: Google Cloud or AWS or Azure), won't there be a charge of data transfer by these Cloud services?
    e. Is there any package plan for One-Year or 3-Year Term contract if we choose from the Marketplace?
    f. What is the level of Technical Support provided by for Elasticsearch by Elastic Cloud from the Marketplace?
    C. AWS Elasticsearch Service
    Since they do not provide a provision to install the custom plugin, we do not think this service suits us anymore.

I have some initial questions.

Is this measured in your current system or Elasticsearch? Does it include replicas? How much space is this expected to take up in Elasticsearch?

As far as I can see this region is not supported on the Elasticsearch Service by Elastic Cloud, which may limit your choice. Are any other region an option?

This is the size of the Lucene index on the current server (not Elasticsearch). We are not maintaining the replicas on the current server but would like to have replicas after switching to Elasticsearch.

We can think of other region too if there is no other option.

I would recommend indexing a part of your data into Elasticsearch so you can better estimate how much space you will need. There has been a lot of changes between the Lucene version you use and the one currently used by Elasticsearch.

Most of these questions are probably best answered from someone on the Elastic Cloud team, but I will have a go.

As far as I am aware option A and B are the same service, the main difference is in how you pay for it although there may be some other differences too.

You will need to do some tests to see how much space your data will take up on Elasticsearch and benchmark to see f it meets your search speed requirements.

On Elastic Cloud I/O Optimized nodes are generally recommended for search use cases.

I am not sure I understand your question. Could you please clarify?

This is described here. You will probably need to contact Elastic for more details.

Elastic Cloud provides a load balancer in from of the cluster.

This is a question for Elastic.

Yes, there are typically data transfer charges.

I believe it is possible to sign longer term contracts.

In order to upload custom plugins you need Gold or Platinum support, so support is included. Contact Elastic for further details.

1 Like