Deduplication - Collapse - Pagination

Hi,

I have two data sources - Global (14M records) & customer specific (5M records)
Data - Products with fields - name, description, manufacturer part number (MPN), vendor sku

Some search query.
Output -
Results grouped over MPN with relevancy (most matching doc score decide relevancy of group) from all data sources with deduplication across pages and pagination over grouped results.

Can I use Collapse on MPN with my search query on Unified Data model (differentiating docs by source_type ) here ? with nearly 22M records for fast retrieval.

Is there any other way to do it ?
I can’t pre group global data with customer specific data on MPN need that separation as there can be n customers and single doc can be very large.

Is there any other way to avoid grouping over Global Data source docs as its huge data. Some other ways with queries?

Unified data models -

Product from Global data source (no company_id)

{"mpn": "DELL-G7GV0","company_id": null,"source_type": "icecat","source_id": "80076143","product_name": "DELL G7GV0","description": "DELL G7GV0. Brand compatibility: Dell","manufacturer": "DELL","vendor_sku":"244545"}

Customer specific data

{"mpn": "DELL-G7GV0","company_id": "f1eb99a720ef4f05bfa6beae16aa235c","source_type": "custom_distributor","source_id": "34565434","product_name": "DELL G7GV0","description": "DELL G7GV0. Brand compatibility: Dell","manufacturer": "DELL","vendor_sku":"2345654"}  

@stephenb

Hi @gurcharan_singh Welcome to the community.

A couple house keeping items

First please do not @ people in your topic that have not already joined, doing this is consider not best forum etiquette. And doing so can turn off / limit the people that may answer the question... and in fact this is not my area of expertise :slight_smile: so you have already limited your chance of success....

What Version are you running?

You are asking a really broad question like

Can I use Collapse on MPN with my search query on Unified Data model (differentiating docs by source_type ) here ? with nearly 22M records for fast retrieval.

Fast Retrieval can mean many things
You are speaking of your data like we all understand it :slight_smile:

Also you will need to provide further examples like a couple documents and what your expected results.

That will usually help ... the more you provide the more likely you will get help.

I may or may not be able to help but perhaps with more info someone else can