Hi all, recently I was exploring how to synchronize data from two Elasticsearch clusters of different versions, and luckily I came across a fantastic open-source tool on Github called INFINI Gateway.
According to the official description, INFINI Gateway is a reverse proxy for Elasticsearch clusters that can do many things, such as traffic control, query result caching, request logging and analysis, as well as traffic replication.
Traffic replication means that INFINI Gateway replicates the traffic it receives to multiple clusters, which can be different versions of Elasticsearch or even Opensearch, which is amazing.
I had also considered using a message queue to achieve synchronization between multiple ES clusters, but I believe that INFINI Gateway’s solution is better, lighter, and offers more extensibility features that I may consider using in the future.
Now, let me briefly talk about how I used INFINI Gateway to achieve data synchronization.
- Download
Click here. - Modify the INFINI Gateway configuration file
The default config file is not for Traffic Replication. Download this configuration file from github and modify it according to your envs.
I only modified the resource definition section at the top.
#primary
PRIMARY_ENDPOINT: http://192.168.56.3:7171
PRIMARY_USERNAME: elastic
PRIMARY_PASSWORD: password
PRIMARY_MAX_QPS_PER_NODE: 10000
PRIMARY_MAX_BYTES_PER_NODE: 104857600 #100MB/s
PRIMARY_MAX_CONNECTION_PER_NODE: 200
PRIMARY_DISCOVERY_ENABLED: false
PRIMARY_DISCOVERY_REFRESH_ENABLED: false
#backup
BACKUP_ENDPOINT: http://192.168.56.3:9200
BACKUP_USERNAME: admin
BACKUP_PASSWORD: admin
BACKUP_MAX_QPS_PER_NODE: 10000
BACKUP_MAX_BYTES_PER_NODE: 104857600 #100MB/s
BACKUP_MAX_CONNECTION_PER_NODE: 200
BACKUP_DISCOVERY_ENABLED: false
BACKUP_DISCOVERY_REFRESH_ENABLED: false
-
Run
./gateway-linux-amd64 -config replication_via-disk.yml -
Testing
Shoot your bulk requests to the Gateway's endpoint which defaults to http://Your-IP:18000
Once the PRIMARY has executed the bulk request successfully, the Gateway then replicates the bulk request to the BACKUP. Otherwise, the Gateway just returns the error information from the PRIMARY to the client that sent the bulk request.
If you send a query to the gateway, it will be forwarded to the PRIMARY for execution, and the results will be returned to the client.
Interestingly, if the PRIMARY is unavailable when you send the query, it will be forwarded to the BACKUP for execution and return the results.
I hope this can help those in need.