Elasticsearch on a wide scale around Globe

(Michel Laporte) #1


We are due to deploy Elasticsearch (Along with Graylog) to our Office in London , New York, Seattle & San Francisco

We have a a server in each remote office (Graylog & Elasticsearch). However i have a small question / Problem.

We will be logging quite a lot of information. What i would like to do is:

Main Office : London
Store all information in the cluster in Elasticsearch DB here

New York (Main US Office):
Store all Northern American office Logs in Elasticsearch Database here (NY, Seattle & San Fran logs will be saved in the Elasticsearch DB

(Seattle, Sanfran will store logs ONLY from their office. So Network devices , Syslog messages in their respective office will be stored on their own Elasticsearch database only)

I dont want Seattle & San Fran to hold the whole Elasticsearch DB as the offices are only small and only about 10 devices will be logging to ES / Graylog . Whereas NYC and UK will have > 100 devices/Servers logging to itself..

I want to be able to have UK and NYC holding ALL the information in the ES cluster and Sea and San fran to only hold their own logs but also send it to NYC so we have a backup. Is that possible?

I've seen Shard Allocation filtering but unsure on how to go ahead with it. Sorry if this is confusing it's been a project for > 6 months and ideally want to roll it out in the next 30-40 days.

Thank you for all your help,
Junior Sys Admin

(Mark Walkom) #2

You do not want to create a single cluster than spans all these sites. ES is latency sensitive and any networking issues would cause you dramallamas.

Best option if you want to do this is to use snapshot + restore to copy data around.

(Michel Laporte) #3

So would you recommend a cluster for Northern America and a Cluster for UK?

Or a different cluster for each office? (Even if SEA and San Fran will be fairly small)


(Mark Walkom) #4

If you are happy shipping things over the wire then I'd have a cluster per continent, with indices split into per site and per source (ie network, system etc).

(Michel Laporte) #5

Okay thank you so much for clarifying this.

I will set up a multi cluster.

How do you split indices ?

(Ron) #6

I'm looking for similar functionality but for disaster recovery. We have two datacenters, one that houses our DR environment and another that houses our production. Our production environment runs robust servers (SSDs, etc.) while our DR environment would be more spartan with VMs mounting NFS shared (definitely not ideal, but it would allow us to operate our business).

Based on what I've been able to piece together, we'd want a separate "disaster recovery" cluster setup at our DR center that operates completely independently of production, and we'd want to sync data between them somehow.

Is there a way this can be automated on some kind of schedule? Is that something we'd have to do on our own with cron jobs or is there support for this sort of thing natively?

(Otis Gospodnetić) #7


You could feed the same data to multiple clusters (see Kafka, Kafka MirrorMaker) or maybe you could use ES snapshots if not being 100% up to date in secondary DCs is acceptable.


Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

(Mark Walkom) #8

Just name them differently for each data type/source :slight_smile:

(Mark Walkom) #9

What Otis mentioned! There's also a bunch of other threads with similar questions that may provide other info.

(Michel Laporte) #10

Thanks for your help Mark :slight_smile:

(Ron) #11

Thanks @otisg, I'll look into that!

(Ron) #12

yeah I've been looking around but there doesn't seem to be an "official" way to do it .. most seem to point to doing a snapshot from prod and restore in your "mirrored" environment, or if you need it live to do a "distributed indexing" or something where whenever you index in one cluster you distribute that indexing request to multiple clusters.

We'll probably end up doing something where we regularly snapshot the prod index to a network file share, then periodically restore a full snapshot to DR just to keep it "close" to production. In our case we don't need the DR index to be always 100% caught up with production, just relatively close.

(Mark Walkom) #13

As with most things ES, it depends on your use case and requirements :slight_smile:

(system) #14