Cross site replication using zoning and response time impact

Sivaramu · May 7, 2017, 7:28am

After enabling zoning on a Production Cluster recently we have started noticing reduced response times for search / scroll requests.

Version : 2.4
Cluster : 18 Nodes (12 data 8GB + 3 master 1GB+ 3 coordinator 8GB)
Data Volume: ~ 500 Mil records - Data (Not logs)
Data Type : Time series - Daily Indices (Indices older than 2 months are merged into monthly indices)
Indices : 500 Indices / 1000 Shards / 2004 Segments / 620GB Size
Index Settings : 2 shards and one replica (now with zoning enabled we want the replica in that zone).
Hot-Warm architecture - Indexing and searching happens on one zone only replicas are created on other zone.
TransportClient is used for both Indexing and Searching
elasticsearch.yml : All settings are default settings.

Indexing throughput was 18K per second without zoning and now it is 10K per second. (Is this expected for cross-site replication?). Sites are connected with 10GBPS pipe and initial replication was quite fast.

Search requests that took less than a second are taking more than 3 seconds. Scrolling through an index has shown a similar impact.

I have tried the query parameter preference=_only_nodes:zone:ZONEA which shows a remarkable improvement from Rest Client however on the bulk runs using Transport Client does not seems to have any effect.

As discuss is blocked from my org I could not share the cluster stats. May I know if there are any specific practices / recommendations for Hot-Warm replication using zoning with preference query param? What preference is most suitable for searching and scrolling? and should I use it while indexing also?

Christian_Dahlqvist · May 7, 2017, 8:36am

I am not sure I understand what you are doing. The purpose of the hot/warm architecture is to have a set of nodes that have very fast disks (typically local SSDs) performing all indexing into primaries and replicas. They also serve queries against the most recent data, which in many cases is queried most frequently. These nodes hold relatively little data as it is accessed frequently. The purpose of warm nodes is to h9old older indices that basically are read-only. Since they do not handle any I/O intensive indexing they can have slower storage and more data per node.

It sounds to me like you may have defined your zones in a different way, which would not be a hot/warm architecture. Could you please clarify?

Sivaramu · May 7, 2017, 3:35pm

Thanks for your time @Christian_Dahlqvist.

From your explanation I realized that my usage of Hot-warm is incorrect. I was confused with Hot-Warm sites with ES Hot-warm setup. Let me explain what I am trying to do with zoning.

From infrastructure point of view, our secondary site is a cold site and we have been using NAS storage replication to replicate the index so far. Now we are trying to use ES zoning to replicate the data to the cold site (ES nodes will be up).

What we want?
Search / scroll requests should always go to nodes on Primary Site (Zone A). Indexing requests also goes to Primary Site, but should be replicated to Secondary Site.

We are ok with Indexing requests delay as it happens as batches, but search / scroll requests should be reasonably fast. So far the response times for search / scroll requests are well under a second.

What did I try so far?
When I run _search_shards, I am seeing that primary / replica shards are selected across the zones (sites). This makes sense as it is a single cluster so search requests could be redirected to any node. I am trying to tweak ES settings to force it use nodes at primary site.

From the documentation I see that I can use preference query parameter to redirect requests to nodes in Primary Site. From Rest Client I certainly see correct shards are picked and response times are certainly better, even a JMeter load test shows better response times. However requests from Transport Client are not showing any difference with something like this setPreference("_only_nodes:zone:ZONEA").execte().actionGet();

My question is, is there any setting in elasticsearch.yml that I can use to force ES to direct nodes to Primary Site? Or is there any other way I can achieve this above replication?

Apologies for being verbose, I have started on this forum today.

Christian_Dahlqvist · May 7, 2017, 4:34pm

If you had been using Elasticsearch 5.x, it looks like you might be able to use the _prefer_nodes preference option to achieve this. The rationale behind this new feature is described in this GitHub issue. In Elasticsearch 2.4 I do however not think it is possible.

Indexing into a primary shard is basically the same amount of work as indexing into a replica shard, so spreading a cluster across multiple datacenter will increase indexing latency and likely also throughput. You can also not control which zone the primary shard will be located in as Elasticsearch need to be able to promote replicas to primary when needed.

Sivaramu · May 8, 2017, 5:40am

Thanks for the link @Christian_Dahlqvist

I have read it and we will not be able to upgrade to 5.x due to pending FOSS approvals.

While tracing the issue with transport node i have noticed that initial search requests are posted to nodes based on my preference but subsequent scroll requests were pulled from shards across zones.

SearchScrollRequestBuilder does not have any way to set preference. Ideally does it inherit the preferences from initial SearchRequest? Or is this a an issue with the API? Or am i missing something here?

system · June 5, 2017, 5:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hot-Warm Architecture query? Elasticsearch	6	2544	July 5, 2017
Index speed is greatly reduced when doing replication Elasticsearch	7	431	April 23, 2019
How hot/warm architecture enhances query performance? Elasticsearch	11	1805	July 25, 2018
Cluster With Mulitple Indexes With Different Performance Requirements Elasticsearch	3	433	November 17, 2018
Cluster configuration, shards, and replica Elasticsearch	4	394	June 11, 2020

Cross site replication using zoning and response time impact

Related topics