Nested Vector Search -> Retrieve k chunks

Following is my use case to store data in Elasticsearch for workspace search connecting different data sources.

  1. Text from file is chunked and stored in different document used to vector and keyword search chunk.
  2. However, each file has set of allowed users as well as allowed groups who can access the document. Users can belong to group and used to search based on access. I want to support both keywork and semantic search.
  3. I want to avoid permissions duplicacy for each text chunk.
    What's the best way in order to index such data so that filtering also becomes easy. I want to filter the data while querying instead of applying pre filter/ post filter.
  4. For access based control, each document can have list of allowed user, allowed group, or all user can have permission to it.

Consider I have millions of files and their permissions to index, for example google drive of an organisation using service accounts, what should be ideal data storage strategy, optimal search.

Considering I decide to store a single document per file with nested vector search, I have a use case of finding top-k passages irrespective of the top level document.

Even if the top 2 passages are from same document, then both of those passages should be returned.

I believe you've already asked this question here: Elastic search document access based control and vector search

Please refrain from opening duplicate discuss threads, and be patient as we work to answer your old post. The discuss forums are monitored by our engineers during free cycles, and don't have a guaranteed response timeframe. If you need faster replies, consider our support or consulting services.