Title: Implementing S3 Buffer Layer for Elasticsearch - Looking for Community Feedback

Hadj_Hassine_Younes · August 6, 2025, 2:48am

Hi everyone,

We're evaluating an S3-based buffer architecture to address circuit breaker issues and reduce costs in our ECK deployment. Would really appreciate insights from anyone who's implemented similar patterns.

The Challenge

Running ECK 8.14 on AWS EKS with significant stability issues:

Circuit breakers triggering 2-3 times daily during traffic spikes (63GB variations)
75-78% heap pressure causing cluster instability
~ PVC storage costs
Data loss during Elasticsearch downtime

Proposed Architecture

Moving from direct ingestion to S3-buffered approach:

The architecture uses S3 as an intermediate buffer layer between data sources and Elasticsearch

Key Components:

Data Collection:

FluentD Forwarders (DaemonSet) collect logs from pods
FluentD Aggregator (StatefulSet) writes to S3
OpenTelemetry Collector exports APM data to S3

Buffer Layer:

S3 bucket with lifecycle policies
SNS notifications on new objects
SQS queue for rate-controlled processing

Processing:

FluentD/Logstash polls SQS
Batch processes S3 objects
Controlled indexing to Elasticsearch

Design Considerations

1. Latency Trade-off

Moving from ~2 seconds (direct) to 30-60 seconds (buffered). Is this acceptable for most use cases? Our monitoring team is okay with it for logs, but uncertain about APM data impact.

2. S3 Data Format

s3://observability-buffer/
├── logs/dt=2025-01-18/hour=14/*.json.gz
├── traces/dt=2025-01-18/hour=14/*.otlp.gz  
└── metrics/dt=2025-01-18/hour=14/*.otlp.gz

Should we use JSON, Parquet, or native OTLP format?

3. Cost Projection

Current: ~$2000/month PVC + oversized cluster
Projected: ~$150/month S3 + smaller cluster
Savings: ~40-50%

Are these realistic?

4. OpenTelemetry Concerns

Will buffering APM data through S3:

Break trace correlation?
Impact service maps in Kibana?
Affect ML anomaly detection?

Questions for the Community

Production Experience: Has anyone run S3 buffering at scale (50+ GB/day)? What issues did you encounter?
Processing Layer: What works best for S3→Elasticsearch?
- FluentD with S3 input plugin
- Logstash with S3 input
- Elastic Agent S3 input (Beta)
Alternative Approaches: Should we consider:
- Elasticsearch frozen tier with searchable snapshots?
- Kafka (concerns about operational overhead)
- Direct indexing with better circuit breaker tuning?
Elasticsearch 9.x: Are there features in newer versions that would eliminate the need for this architecture?
Best Practices:
- Optimal S3 object size?
- SQS batching strategies?
- Handling out-of-order events?

Current Setup

ECK 8.14 on AWS EKS (eu-west-3)
3 data nodes, 2 masters
OpenTelemetry Collector 0.128.0
FluentD
Daily volume: +63GB logs + APM data

Reference

Similar to JustEat's approach but adapted for Kubernetes and OpenTelemetry.

Would love to hear your thoughts, especially if you've tried something similar or see potential issues with this approach.

Thanks!

leandrojmp · August 6, 2025, 3:28am

What is the daily volume you have for APM data? Also, is this 63 GB or TB? 63 GB per day is pretty small for Elasticsearch, using s3 as a buffer in this case seems unnecessary, you may be able to fix your performance issues in other ways.

What issue exactly are you having and what are the resources of your nodes?

Also, how did you troubleshoot that your issue is related to indexing and not to search?

Christian_Dahlqvist · August 6, 2025, 3:32am

This is not good. You should always aim to have 3 master eligible nodes in the cluster.

DavidTurner · August 6, 2025, 8:11am

I agree with Leandro, this sounds like pretty light load, you shouldn't need anything so complex to deal with it. You already have some client-side buffering (log files are naturally buffered anyway and APM traces should be buffered in the collector) to smooth out any peaks, but maybe you need to adjust the config in this area to make better use of it. That's definitely what I'd investigate before introducing so much other operational complexity into the system.

This version is over a year old and there have been improvements in this area since then - at least #113044 applies more effective backpressure when overwhelmed by spikes in indexing. You're due an upgrade.

Hadj_Hassine_Younes · August 6, 2025, 9:21am

hello @DavidTurner and @leandrojmp , Thank you for your response.

the 63GB is from just 5 test deployments.
In production ( the same for pre-production even can be more) , we're looking at:

500+ customer deployments across multiple EKS clusters
20 microservices per customer
Expected 1TB+/day (possibly 2-3TB with growth)
Multi-region architecture (multiple EKS clusters)
Cross-region replication requirements

At this scale, the S3 buffer architecture becomes more compelling for several reasons:

Cost: PVC costs across multiple regions at TB scale vs S3 is significant
Multi-region: S3 replication is far simpler than Elasticsearch CCR
Blast radius: Buffer isolates Elasticsearch from ingestion spikes
Scalability: S3 scales infinitely without pre-provisioning

Given this context, would you still recommend direct ingestion, or does the S3 approach make more sense?

DavidTurner · August 6, 2025, 9:29am

I think I'm missing something here - you're proposing S3 as a temporary buffer for the data on its way into Elasticsearch, but the PVCs in the ES cluster relate to the permanent storage of the data which would be the same either way.

Searchable snapshots will be worthwhile at this kind of scale.

Are you sure you need replication, or do you just want to do cross-region searches? Cross-cluster search is much simpler, and how we handle this kind of global data from our own internal systems within Elastic. FWIW our internal systems don't have this kind of separate buffering layer, it's all done with client-side buffering.

Hadj_Hassine_Younes · August 6, 2025, 10:05am

Your point about cross-cluster search is valid, but we still need to solve:

How to prevent traffic spikes from overwhelming ES?
How to reduce PVC costs when we need 30-day or more retention?
How to handle ES downtime without losing data?

Simple goal: Reduce PVC costs and prevent circuit breakers in our multi-region setup:

Smaller ES clusters sized for average load
Cheaper storage for historical data
No circuit breakers
No data loss

Hadj_Hassine_Younes · August 7, 2025, 3:13am

hello @leandrojmp and thank you.
Those 63 GB are in test environment with 5 separate deployments of our 20 miscroservices solution .

we expect more ammount of data , we will do multi cluster eks .

my concerns are about the PVCs cost ( we are deploying our eck-stack on eks k8s) , those PVCs are part of PV ( EBS) so we expect a cost increasing , that's why I am thinking about a work arround by using AWS S3 not only as buffer layer but it can be our primary storage because we can controll ingestion by SNS/SQS , so in our PVCs we can reduce retention periode.
also it can be a work arround to do multiregion observability.

what do you think .
@DavidTurner also thank you for your reactivity.

thank you a LOT.

Hadj_Hassine_Younes · August 7, 2025, 3:29am

thank you @Christian_Dahlqvist ,

in fact it's a 1 master node with 3 data nodes and it's a test env.

but when we will be in production and QA env we will do as you said 3 master nodes, but now I am making tests and research with the minimum of resources so when increasing the resources at scale we will maintain perfectly our stack with the maximum of optimization.

please as you are an expert , what are your recommendations for performance tuning and memory managment of elasticsearch nodes ( master and data)
also , what are the best practices when configuring indices .

thank you for your reactivity .

leandrojmp · August 7, 2025, 3:45am

I don’t see how S3 would have any impact on this, your primary storage would be the storage for your elasticsearch data nodes and these cannot be on S3, also, for better performance you need fast storage, so you need something backed by fast disks like nvme, at least for the hot tier.

It is not clear what exactly is your issue as you didn’t provided what are the specs of your cluster or the errors you are getting, there are many things that you can do to tune the cluster.

Having a buffer between the source of the data and Elasticsearch is pretty common, I use Kafka as a buffer layer in some data ingestion flows, but not everything, I also have thousand of agents sending data directly to Elasticsearch.

If you send your data to S3 buckets, you would need to also have a SQS queue configured to receive notifications of new files and use the Elastic Agent to get the data and send to Elasticsearch, this is a pretty common scenario, there are multiple services that ship data directly to S3 buckets for example (Cloudflare, Github, AWS etc) and you can use Elastic Agent to get it from there.

Without SQS you cannot consume your data in parallel and depending on the volume of the data you may need multiple agents to consume it, which would need multiple VMs/Pods.

DavidTurner · August 7, 2025, 8:09am

Upgrade to pick up the improvement I linked above (amongst others) and also make sure there is sufficient client-side buffering instead.

This is not relevant to the question of buffering incoming data in S3. My recommendation would be searchable snapshots.

Client-side buffering, and avoiding ES downtime (e.g. by upgrading to pick up fixes and perf improvements).

Hadj_Hassine_Younes · August 7, 2025, 4:02pm

Thank you for your feedback guys.

Topic		Replies	Views
S3 as intermediate stage in Elastic stack pipeline Logstash	3	876	August 23, 2017
ELK architecture advice with S3 Elasticsearch	3	2146	January 16, 2017
Index Backups to S3? Elasticsearch	8	1708	July 6, 2017
Running on EC2 S3 vs EBS Elasticsearch	3	814	July 6, 2017
ElasticSearch on Amazon S3 Elasticsearch	7	350	July 6, 2017