Setup elastic in production environment

tariqkhatri_gaditek · June 16, 2015, 12:08pm

I want to know what configuration setup would be ideal for my case. I have 4 servers (nodes) each with 128 GB RAM. I'll have all 4 nodes under one cluster.

Total number number of indexes would be 10, each getting data of 1500000 documents per day.

Since I'll have 4 servers (nodes) so for all these nodes I'll set master:true, and data:true, so that if one node goes down, other becomes master. Every index will have 5 shards.

I want to know which config parameters should I alter in order to gain maximum potential from elastic.

Also tell me how much memory is enough for my usage, since I'll have very frequent select queries in production (may be 1000 requests per second).

Need a detailed suggestion.

Harlin_ES · June 16, 2015, 4:13pm

http://google.ca/

colings86 · June 16, 2015, 5:49pm

This is hard to answer without saying 'it depends'. The configuration parameters which are best for your use-case depend on a huge number of variables such as, the size of your documents, how many fields a document contains, what type of fields you have, how the fields are analyzed, what queries you are running , to name just a few. The best way to tune your Elasticsearch instance is to run representative queries on your data (or a sample of your data), measure performance (memory usage, query times, etc.), identify bottlenecks and search the documentation and the archives on this forum to identify how to fix those bottlenecks.

Of course, if you can't find the way to resolve a particular bottleneck you can also open a topic in this forum. If you do, please try to give as much information as possible about the problem you are seeing and what you have tried to resolve it. This will help a lot in finding a solution for you quickly.

There are a few pointers to give you to get you started though. The book 'Elasticsearch - The Definitive Guide' is free online and has a chapter dedicated to things to consider for production deployments.You can find it here: https://www.elastic.co/guide/en/elasticsearch/guide/current/deploy.html

I would recommend reading as much of that book as you can as it will give you an understanding of what Elasticsearch is doing under the covers and how you might be able to solve the problems you encounter.

Also we generally recommend you assign 50% of your servers memory for the JVM heap and leave the rest to the file system cache (Elasticsearch makes heavy use of the File System Cache). You did say, however, that your servers have 128GB memory, and so 50% of your memory would be 64GB which is above the maximum recommended heap size of 31GB that we recommend (this is because above 31GB heap the JVM cannot use compressed pointers and performance actually decreases). Here you can either assign each node 31GB of JVM heap memory and leave the remained to the file system cache or run two nodes on each server, each with 31GB heap. More information on this can be found here: https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html (especially in the 'Dont Cross 32 GB!' section)

HTH

warkolm · June 16, 2015, 10:07pm

@Harlin_ES You could be a little more helpful here.

tariqkhatri_gaditek · June 17, 2015, 6:55am

@colings86 thanks for your detailed answer. Here are my few more questions continuing the above.

Here is the scenario explained:
I have four nodes (4 servers), that are divided in two regions, for e.g 2 servers in Germany , and remaining 2 in france.

Germany servers naming (g1 and g2) -> g1 being primary server and g2 is it's failover.
France servers naming (f1 and f2) -> f1 being primary server and f2 is it's failover.

Now we have ES setup on g1, g2, f1 and f2 under one cluster.

My web application that resides on g1 and f1, being it's failover on g2 and f2, wants to communicate to elasticsearch that has availability on all servers, but what I want is to query the nearest ES node which means web app on g1 should ask for data to g1 node (ES), and same way app on f1 should ask to node f1(ES), but there may be option if web app on g1 asks for data to node g1 (ES) but it is unavailable at that time, in that case I want g2(ES node) to respond, since g2 node is nearest to it, so data will quickly get received.
My questions are:

How should I smartly query to specific nodes knowing that are near to my web app, and they respond quickly rather than querying node in other region ?
If nearest node fails, how can I smartly query the next nearest node (which is up) ?
I want my data to be present on all the nodes, so is it a good practice to keep master:true and data:true for all nodes ?

warkolm · June 17, 2015, 7:10am

Don't run cross DC clusters, we don't support it due to latency sensitivity and potential for split brain.
it also makes questions like the ones you are asking harder to answer.

Have two clusters, replicate the data between them, and then query each cluster locally.

tariqkhatri_gaditek · June 17, 2015, 7:13am

@warkolm In case there is physical disaster in one region say Germany, and I heavily rely on my ES data, how can I get my data back, if I haven't setup another node in some other region say France. Please guide.

warkolm · June 17, 2015, 8:00am

You need to send data to both clusters separately, or replicate it using a queuing mechanism.

tariqkhatri_gaditek · June 17, 2015, 1:21pm

@warkolm can you please give us an example of cluster replication

warkolm · June 18, 2015, 2:04am

Get your app to write to each cluster directly.
Get your app to write to a messaging queue and then use that to replicate to each cluster.
Use the snapshot and restore API to get data from one cluster to another.
Use Logstash to copy the data across.

Topic		Replies	Views
Configuration of elasticsearch in production environment Elasticsearch	2	1219	November 24, 2016
Advice for production configuration Elasticsearch	2	280	July 6, 2017
Few Queries regarding Producion Cluster Configuration Elasticsearch	4	398	March 27, 2017
Elasticsearch cluster setup Elasticsearch	4	391	July 6, 2017
ES Recommended Configuration? Elasticsearch	3	928	July 6, 2017

Setup elastic in production environment

Related topics