Cluster Replication failover


(Fernando) #1

Currently Im working on Automating, cluster availability. So we have two elasticsearch clusters in two different regions

  • Primary in CA.
  • Stand by in ATL.

Each cluster have 2 servers each. We have snapshot enable in both, and the primary is the one currently serving to our application.

We use HAProxy to point primary or stand-by to our application depending if we need to perform maintenance or in case one fails we manually point to the one that is up.

The process of bringing them is pretty much manual and I want to do this in more of automatic way.

So the primary cluster perform a snapshot everyday at 12am which is dump in our TrueNAS. this is automatic replicated to a mount point in the same TrueNAS system and is accessible to Stand-by to be use in case we need to restore the cluster.

Stand-by cluster also perform a snapshot of its data and is dump into trueNAS in different location this is not replicated to primary.

I was wondering if ES has some mechanisim or any suggestion on how I can accomplish this.

I created a script that is triggered by a cronjob everyday at 6:00am in the stand-by cluster and that will remove all open indices and restore the snapshot from the primary cluster.

#!/bin/bash

 Variables
  SNAPSHOT=`/bin/ls -lahrt /es_snapshot/replica/production_snap | /bin/grep snap | /bin/awk '{print $9}' | /bin/grep "$(date '+%Y_%m_%d')"| /bin/sed s/snap-//g| /bin/sed s/.dat//g`
  LOG_FILE="/var/log/elasticsearch/es_snapshot_restore_from_primary.log"
  CURRENT=$(/usr/bin/curl -s XGET http://`/usr/bin/facter ipaddress`:9200/_cat/indices?v | grep open | awk '{print $3}')
  DATE_LOG=`date +%Y_%m_%d_%T`
  INDICES=(index1 index2 index3 index4l)
  RESULT=()


# Restore Function
  AUTO_RESTORE () {
    if [ "$SNAPSHOT" != "" ]; then
      echo "$DATE_LOG  -- Checking status of  Snapshot $SNAPSHOT" >> $LOG_FILE
      SNAP_CHECK="/usr/bin/curl -s -XGET http://`/usr/bin/facter ipaddress`:9200/_snapshot/sjc_es_data/$SNAPSHOT/_status"
      SNAP_RESULT=`$SNAP_CHECK | grep -o -i '"SUCCESS"'`
      if [ "$SNAP_RESULT" == '"SUCCESS"' ]; then
        echo "$DATE_LOG  -- Restoring Snapshot $SNAPSHOT" >> $LOG_FILE
        /usr/bin/curl -s -XPOST http://`/usr/bin/facter ipaddress`:9200/_snapshot/sjc_es_data/$SNAPSHOT/_restore
        exit 0
      else
        echo "$DATE_LOG  -- Snapshot $SNAPSHOT could not be restore. [FAIL]" >> $LOG_FILE
        $SNAP_CHECK >> $LOG_FILE
        exit 1
      fi
    else
      echo "$DATE_LOG  -- No snapshot found to restore" >> $LOG_FIL
      exit 1
    fi
  }

# Looks for open indices
  for i in ${INDICES[@]}
    do
      STATUS="/usr/bin/curl -s -XGET http://`/usr/bin/facter ipaddress`:9200/_cat/indices/$i?h=status"
      RESULT+=($($STATUS))
  done

# Deletes indices that are open and restore the snapshot from primary
  MATCH=$(echo "${RESULT[@]:0}" | grep open)
  if [[ ! -z $MATCH ]];then
    echo "$DATE_LOG -- There is some open indices" >> $LOG_FILE
    echo "$DATE_LOG -- Following indices will be erase in order to upload the snapshot "$CURRENT"" >> $LOG_FILE
    for index in $CURRENT; do
      /usr/bin/curl -XDELETE http://`/usr/bin/facter ipaddress`:9200/$index
      sleep 2
    done
    AUTO_RESTORE
  else
    AUTO_RESTORE
  fi

By the way this is version 2.3.1.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.