Please don't deprecate sparse vector fields

Hi there!

I noticed that support for sparse vectors is being deprecated due to lack of interest and use cases: https://github.com/elastic/elasticsearch/pull/48781.

This is a great article that discusses the trade off between sparse and dense vectors as a balance of recall (how much should we pull back for a given query) and precision (how close must a term match be to return it).

My use case is that I want to be perfectly precise with term queries. Given documents:

{
"fieldA": {"one": 3, "two": 10, "five": 70},
"title": "some document"
},
{
"fieldA": {"three": 8, "four": 9, "ten": 33},
"title": "some other document"
}

I want the term frequency of each term in fieldA to equal the number in the sparse vector.
Currently I'm repeating the term the number of times the count, but that's not very scalable once the numbers get large.

I'm not quite sure how I'd query this since sparse vector queries are sparse vectors... is it possible to query a sparse vector field with a dense vector query?

Is this possible some other way?

Hello,
indeed, the sparse vectors have been removed from elasticsearch due to the lack of solid use cases. However, if we see interesting use cases that sparse vectors can address, we will consider to reintroduce them. But it looks like the article you referenced talks about a different type of sparse vectors – the way traditional Lucene indexes are organized. Postings lists in a Lucene index indeed represent sparse vectors.

About your use-case, can you please give a more detailed example of a type of query you want to run and examples of matching documents?
It is possible that rank_features datatype and rank_feature query can address your use case.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.