Hello everyone,
I am trying to better understand the tradeoffs between Elasticsearch Serverless, Data Streams, TSDS (index.mode=time_series), and future vector search workloads.
Current Situation
We currently have around 850 GB of searchable data (e.g: government dos, political social media posts, press, tenders...etc) in Elasticsearch Serverless spread across several large indices.
At the moment:
-
all data appears to be treated as “search-ready/hot”
-
our Search VCU baseline is currently 16 VCUs on the On-demand tier
-
query latency is already somewhat variable
Our current workload is primarily:
-
filtered search
-
customer-facing feeds
-
document retrieval APIs
Typical query ranges:
Goal
Our main objective is to reduce the amount of data considered “search-ready” in Serverless in order to lower the Search VCU baseline.
From discussions with Elastic support, my understanding is that regular indices are treated differently from Data Streams / time-aware indices regarding the Search Boost Window.
Because of this, we are evaluating whether we should convert around 7 large indices into:
so that older data falls outside the Search Boost Window rather than keeping the full ~850 GB permanently hot/search-ready.
Future Plan
We are also planning to add large text embeddings (dense_vector) for semantic/hybrid search features.
This will likely increase storage size by 2–3x, making the Search VCU baseline even more important from a cost perspective.
Main Concerns
I understand that Data Streams create multiple backing indices/shards over time.
Because of this, I am trying to understand whether aggressive time partitioning is actually a good fit for future vector search workloads.
Questions
-
For a future vector-heavy workload, is it better to use:
-
Since Data Streams create multiple backing indices/shards over time, does this become problematic for ANN/vector search because of shard fanout?
-
Has anyone observed how vector search latency behaves:
-
If vectors are stored in older/cold backing indices outside the boost window:
-
For workloads querying up to 90 days frequently, is aggressively time-partitioning vector data still recommended?
Any real-world experience or architectural guidance would be greatly appreciated.
I don't know much on data streams or tsds myself, but I am on the vector search team and have some knowledge there. Hopefully others will share as well.
For context the default vector search has recently been defaulted to our newer DiskBBQ algorithm which is a (default) 1-bit quantization IVF-based solution; think centroids that represent lists of vectors where we are searching those representative hierarchical centroids in exploration and only load vectors we need to explore typically only about 1%. It's pretty close to as small as you get on disk and is very memory friendly. It also scales well with small numbers of resources (vs HNSW graph-based solutions). Questions on any of this are welcome.
future vector-heavy workload, is it better to use
We are in the middle of considering adding a vector-search specific index mode. The intention is to explore more aggressive merging such that we search less small vector indices (segments). Stay tuned a bit here. Most of our testing has been on standard indices but I'm not aware of any reason it wouldn't work on data stream-based workflows or tsds.
does this become problematic for ANN/vector search because of shard fanout
I don't expect it to be problematic but like all good things it depends. It is likely the case that vector search will be more performant if all data being queried is in a single index vs when querying across indices for your primary query use-cases. Under the hood this mostly operates by querying the aforementioned centroids so if you have a lot of small indices, shards, or segments then you wind up unnecessarily exploring. We are actively working to improve this up to an index level by more aggressively merging segments and searching more effectively (stopping earlier in search) across shards. I would expect that as long as you aren't creating overly small indices with data streams you'll be fine.
vector search latency behaves
for both on-demand and performant tier we'll scale down (just not as much on performant) so there's a bit of a warming period if you don't have continuous usage which leads to some initial latencies so having those boost windows can really help. We are actively working here as well so stay tuned; I think the hope is to eliminate or reduce the need for warming. Once scaled up latencies are good and equivalent to running your own machines specifically from a vector search standpoint there isn't much difference. I expect the latency variability you are experiencing now to smooth out a good bit with upcoming work too.
vectors are stored in older/cold backing indices outside the boost window
Latency will spike a bit right now in warm up but it goes away quickly as you can imagine machines are spinning up particularly if a large load occurs suddenly. In my experience the performant tier can help but it sort of sets the floor in which you can run. Really large vector search workflows today need some additional warming time and we know it. Again stay tuned here though there's several folks working in this space.
For workloads querying up to 90 days frequently, is aggressively time-partitioning vector data still recommended?
I think if your queries strictly can be time bound then time partitioning is great. Most workloads I see nowadays are disk partitioned but I think that's because query use-cases seem to vary a good bit across those time boundaries.
And actually first PR around vector index mode just went up if you are interested in some of the direction and intention there: Introduce vector index mode by mayya-sharipova · Pull Request #148789 · elastic/elasticsearch · GitHub
Thank you for your detailed reply and for sharing insights about the current and upcoming vector search improvements. I will definitely keep an eye on the areas you mentioned, especially around segment merging, warming behaviour, and future vector-specific optimisations.
After doing more research and reviewing our workload characteristics, I believe that Data Streams or TSDS are probably not the right fit for our use case. Our platform primarily deals with political and governmental data (government documents, parliamentary data, press articles, social media content, etc.), and many of these documents are updated frequently rather than being strictly append-only.
My initial goal was mainly to reduce the Elasticsearch Serverless Search VCU baseline by finding a way to manage approximately 850 GB of non-boosted/non-time-series data more efficiently and eventually even more storage once we introduce vector embeddings. However, based on the current architecture and limitations, it seems there may not be an ideal solution for this type of mutable workload at the moment.
In any case, thank you again for the explanations and guidance. The information was very helpful in understanding the tradeoffs involved.