Elasticsearch Serverless + Vector Search + Data Streams / Time Series Tradeoffs

Hello everyone,

I am trying to better understand the tradeoffs between Elasticsearch Serverless, Data Streams, TSDS (index.mode=time_series), and future vector search workloads.

Current Situation

We currently have around 850 GB of searchable data in Elasticsearch Serverless spread across several large indices.

At the moment:

  • all data appears to be treated as “search-ready/hot”

  • our Search VCU baseline is currently 16 VCUs on the On-demand tier

  • query latency is already somewhat variable

Our current workload is primarily:

  • filtered search

  • customer-facing feeds

  • document retrieval APIs

Typical query ranges:

  • 14 days

  • 30 days

  • 90 days

Goal

Our main objective is to reduce the amount of data considered “search-ready” in Serverless in order to lower the Search VCU baseline.

From discussions with Elastic support, my understanding is that regular indices are treated differently from Data Streams / time-aware indices regarding the Search Boost Window.

Because of this, we are evaluating whether we should convert around 7 large indices into:

  • Data Streams

  • or TSDS (index.mode=time_series)

so that older data falls outside the Search Boost Window rather than keeping the full ~850 GB permanently hot/search-ready.

Future Plan

We are also planning to add large text embeddings (dense_vector) for semantic/hybrid search features.

This will likely increase storage size by 2–3x, making the Search VCU baseline even more important from a cost perspective.

Main Concerns

I understand that Data Streams create multiple backing indices/shards over time.

Because of this, I am trying to understand whether aggressive time partitioning is actually a good fit for future vector search workloads.

Questions

  1. For a future vector-heavy workload, is it better to use:

    • standard Data Streams

    • or TSDS (index.mode=time_series)?

    • Or none of them are ideal over standard indicies?

  2. Since Data Streams create multiple backing indices/shards over time, does this become problematic for ANN/vector search because of shard fanout?

  3. Has anyone observed how vector search latency behaves:

    • on On-demand tier

    • vs Performant tier
      when querying data outside the Search Boost Window?

  4. If vectors are stored in older/cold backing indices outside the boost window:

    • does latency spike significantly?

    • or do warm compute resources in Performant tier help hide most of the cold-access penalty?

  5. For workloads querying up to 90 days frequently, is aggressively time-partitioning vector data still recommended?

Any real-world experience or architectural guidance would be greatly appreciated.