Filter out duplicate products

Once a day I parse a product feed and index the products to Elasticsearch.

I want to keep it up-to-date and since delete operations are really expensive in Elasticsearch I choose to use rollover index. Every day the products are written to a different index products-yyyy-mm-dd.

Now when I search for products I want to look in todays and yesterdays index but avoid returning duplicates for the same product_id.

Field Collapsing seems like the way to go so by simply doing:

"collapse" : {
	"field" : "product_id" 
}
"sort": ["imported_at"]

I get unique hits by product_id. Unfortunately the total number of hits and aggregation is not affected by this and therefore my filters and counters are not accurate.

How can I rethink my setup? Currently duplicate products will have the same _type and _id but different _index.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.