Hello everyone,
I am trying to better understand the tradeoffs between Elasticsearch Serverless, Data Streams, TSDS (index.mode=time_series), and future vector search workloads.
Current Situation
We currently have around 850 GB of searchable data in Elasticsearch Serverless spread across several large indices.
At the moment:
-
all data appears to be treated as “search-ready/hot”
-
our Search VCU baseline is currently 16 VCUs on the On-demand tier
-
query latency is already somewhat variable
Our current workload is primarily:
-
filtered search
-
customer-facing feeds
-
document retrieval APIs
Typical query ranges:
-
14 days
-
30 days
-
90 days
Goal
Our main objective is to reduce the amount of data considered “search-ready” in Serverless in order to lower the Search VCU baseline.
From discussions with Elastic support, my understanding is that regular indices are treated differently from Data Streams / time-aware indices regarding the Search Boost Window.
Because of this, we are evaluating whether we should convert around 7 large indices into:
-
Data Streams
-
or TSDS (
index.mode=time_series)
so that older data falls outside the Search Boost Window rather than keeping the full ~850 GB permanently hot/search-ready.
Future Plan
We are also planning to add large text embeddings (dense_vector) for semantic/hybrid search features.
This will likely increase storage size by 2–3x, making the Search VCU baseline even more important from a cost perspective.
Main Concerns
I understand that Data Streams create multiple backing indices/shards over time.
Because of this, I am trying to understand whether aggressive time partitioning is actually a good fit for future vector search workloads.
Questions
-
For a future vector-heavy workload, is it better to use:
-
standard Data Streams
-
or TSDS (
index.mode=time_series)? -
Or none of them are ideal over standard indicies?
-
-
Since Data Streams create multiple backing indices/shards over time, does this become problematic for ANN/vector search because of shard fanout?
-
Has anyone observed how vector search latency behaves:
-
on On-demand tier
-
vs Performant tier
when querying data outside the Search Boost Window?
-
-
If vectors are stored in older/cold backing indices outside the boost window:
-
does latency spike significantly?
-
or do warm compute resources in Performant tier help hide most of the cold-access penalty?
-
-
For workloads querying up to 90 days frequently, is aggressively time-partitioning vector data still recommended?
Any real-world experience or architectural guidance would be greatly appreciated.