Add ability to cache HTTP redirects in Elasticsearch output

Hi everyone! I'm a software engineer at IBM in the logging service. We run several large multi-tenant Elasticsearch clusters to support both internal and external customers. An issue we're running into is that we need the ability, for various reasons, to migrate tenants from one cluster to another (often in different data centers).

Our multi-tenant solution uses a custom written ES proxy that authenticates clients and applies capping, throttling, and blocking based on the state of the tenant's account. As part of our solution to tenant migration, we're implementing what we're calling "redirect caching" in both libbeat and the logstash-output-elasticsearch plug-in. This would allow our ES proxy servers to automatically move a tenant's connections from one data center to another automatically without the need for the tenant to reconfigure their log shippers.

We would like feedback as to whether this would be something the community would be interested in having included in the Elasticsearch output for Beats and Logstash. It would benefit us to have it included so that our customers could use "off-the-shelf" shippers instead of us having to maintain a fork of Beats and Logstash.

Below is a description of how redirect caching would work. And here are the links to the code for both Beats and the logstash-output-elasticsearch plugin:

Let me know what you think about the idea. Thanks!

Ray Harris, Software Engineer
IBM Cloud

HTTP Redirect Caching

As part of our investigation into implementing tenant migration, we looked into adding to the log shippers the ability to cache HTTP redirects so that follow-on HTTP requests during that session would use the original redirect.

As an example, consider this. A tenant is using the logging service in data center 1 (dc1). Without redirect caching, the process of shipping a log is as follows:

  • The log ingestion endpoint is logs.dc1.example.com
  • The shipper looks up the IP of the endpoint in DNS
  • The shipper connects to the IP using HTTPS and basic authentication
  • The server checks the tenant's account and returns a 200 if allowed to send logs
  • The shipper starts sending logs

If we want to migrate the tenant to a different data center (eg, dc2), without redirect caching this is the process:

  • Behind the scenes, we migrate the tenant's existing logs to the new data center
  • For a short time we stream their logs from dc1 to dc2
  • We allow the tenant to ship logs to either dc1 or dc2 to submit logs
  • The tenant has to change the endpoints of their shippers to logs.dc2.example.com
  • We block the client from connecting to dc1 and they can only connect to dc2

This process is not scalable and error prone. With redirect caching, the process of shipping a log changes slightly, but will enable the migration process to be automatic. Here's how the redirect caching would work:

  • The log ingestion endpoint is logging.example.com
  • The shipper looks up the IP of the endpoint in DNS
  • The shipper connects to the IP using HTTPS and basic authentication
  • The proxy looks up the tenant and determines what their current datacenter is
  • The response to the initial connection is an HTTP 302 redirect to the correct data center
  • The shipper caches the redirect
  • The shipper connects to the redirect using HTTPS and basic authentication
  • The The server checks the tenant's account and returns a 200 if allowed to send logs
  • The shipper starts sending logs

Now, with redirect caching, if we want to migrate a tenant to dc2, this is the process:

  • Behind the scenes, we migrate the tenant's existing logs to the new data center
  • We stream their logs from dc1 to dc2 until the logs are in sync
  • We change the tenant's account to indicate they are now in dc2
  • The servers in dc1 are informed of the change in data center for the tenant
  • The servers in dc1 close all active connections from the tenant's shippers
  • The shippers reconnect, but this time instead of a 200, they get a redirect to dc2 as described above
  • The shippers now connect and send logs to dc2

The process with redirect caching is scalable and can be automated. The migration can be initiated by the tenant or by the logging service either manually or automatically. The tenant doesn't have to reconfigure their shippers (which often involves redeploying servers or containers).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.