Elasticsearch on a wide scale around Globe

Michel_Laporte · August 17, 2015, 2:56pm

Hi,

We are due to deploy Elasticsearch (Along with Graylog) to our Office in London , New York, Seattle & San Francisco

We have a a server in each remote office (Graylog & Elasticsearch). However i have a small question / Problem.

We will be logging quite a lot of information. What i would like to do is:

Main Office : London
Store all information in the cluster in Elasticsearch DB here

New York (Main US Office):
Store all Northern American office Logs in Elasticsearch Database here (NY, Seattle & San Fran logs will be saved in the Elasticsearch DB

(Seattle, Sanfran will store logs ONLY from their office. So Network devices , Syslog messages in their respective office will be stored on their own Elasticsearch database only)

I dont want Seattle & San Fran to hold the whole Elasticsearch DB as the offices are only small and only about 10 devices will be logging to ES / Graylog . Whereas NYC and UK will have > 100 devices/Servers logging to itself..

I want to be able to have UK and NYC holding ALL the information in the ES cluster and Sea and San fran to only hold their own logs but also send it to NYC so we have a backup. Is that possible?

I've seen Shard Allocation filtering but unsure on how to go ahead with it. Sorry if this is confusing it's been a project for > 6 months and ideally want to roll it out in the next 30-40 days.

Thank you for all your help,
Michel
Junior Sys Admin

warkolm · August 17, 2015, 10:25pm

You do not want to create a single cluster than spans all these sites. ES is latency sensitive and any networking issues would cause you dramallamas.

Best option if you want to do this is to use snapshot + restore to copy data around.

Michel_Laporte · August 18, 2015, 9:10am

Okay.
So would you recommend a cluster for Northern America and a Cluster for UK?

Or a different cluster for each office? (Even if SEA and San Fran will be fairly small)

Thanks

warkolm · August 18, 2015, 9:57pm

If you are happy shipping things over the wire then I'd have a cluster per continent, with indices split into per site and per source (ie network, system etc).

Michel_Laporte · August 19, 2015, 8:30am

Okay thank you so much for clarifying this.

I will set up a multi cluster.

How do you split indices ?

ronchalant · August 19, 2015, 6:26pm

I'm looking for similar functionality but for disaster recovery. We have two datacenters, one that houses our DR environment and another that houses our production. Our production environment runs robust servers (SSDs, etc.) while our DR environment would be more spartan with VMs mounting NFS shared (definitely not ideal, but it would allow us to operate our business).

Based on what I've been able to piece together, we'd want a separate "disaster recovery" cluster setup at our DR center that operates completely independently of production, and we'd want to sync data between them somehow.

Is there a way this can be automated on some kind of schedule? Is that something we'd have to do on our own with cron jobs or is there support for this sort of thing natively?

otisg · August 20, 2015, 10:11pm

Hi,

You could feed the same data to multiple clusters (see Kafka, Kafka MirrorMaker) or maybe you could use ES snapshots if not being 100% up to date in secondary DCs is acceptable.

Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

warkolm · August 21, 2015, 8:06am

Just name them differently for each data type/source

warkolm · August 21, 2015, 8:07am

What Otis mentioned! There's also a bunch of other threads with similar questions that may provide other info.

Michel_Laporte · August 21, 2015, 9:17am

Thanks for your help Mark

ronchalant · August 21, 2015, 2:48pm

Thanks @otisg, I'll look into that!

ronchalant · August 21, 2015, 3:04pm

yeah I've been looking around but there doesn't seem to be an "official" way to do it .. most seem to point to doing a snapshot from prod and restore in your "mirrored" environment, or if you need it live to do a "distributed indexing" or something where whenever you index in one cluster you distribute that indexing request to multiple clusters.

We'll probably end up doing something where we regularly snapshot the prod index to a network file share, then periodically restore a full snapshot to DR just to keep it "close" to production. In our case we don't need the DR index to be always 100% caught up with production, just relatively close.

warkolm · August 21, 2015, 10:33pm

As with most things ES, it depends on your use case and requirements

Topic		Replies	Views
Should I use ElasticSearch across two datacenters for convenient data replication? Elasticsearch	2	521	July 6, 2017
Elasticsearch cluster with nodes on different sites - Hints / Problems? Elasticsearch	6	2034	July 5, 2017
ElasticSearch on AWS - Disaster Recovery? Elasticsearch	7	2628	March 7, 2018
Can ES keep separate clusters in the same state? Elasticsearch	4	386	July 6, 2017
Geo-distributed cluster Elasticsearch	3	807	July 6, 2017

Elasticsearch on a wide scale around Globe

Otis

Related topics