Once a day I parse a product feed and index the products to Elasticsearch.
I want to keep it up-to-date and since delete operations are really expensive in Elasticsearch I choose to use rollover index. Every day the products are written to a different index products-yyyy-mm-dd
.
Now when I search for products I want to look in todays and yesterdays index but avoid returning duplicates for the same product_id.
Field Collapsing seems like the way to go so by simply doing:
"collapse" : {
"field" : "product_id"
}
"sort": ["imported_at"]
I get unique hits by product_id. Unfortunately the total number of hits and aggregation is not affected by this and therefore my filters and counters are not accurate.
How can I rethink my setup? Currently duplicate products will have the same _type
and _id
but different _index
.