Faceted search counts, the 'select distinct' problem

(Jamie Brough) #1

We're looking at migrating from ElasticSearch to postgres. The search
space is very small (10,000 holiday products) and Postgres is
performant for individual searches. However, we use faceted navigation
on the frontend, showing the number of venues with available products
matching the search criteria for alternate dates, regions, categories,
etc. With postgres, each result needs to be calculated individually;
ElasticSearch's faceting would be more suitable and has tested well
with our data.

Our search data has been denormalised to a collection of 'product'
documents, which basically resembles:

{ venue_id: 1,
product_id: 1,
venue_tags: [tag, tag],
product_tags: [tag, tag],
starts_at: 2011-12-3,
ends_at: 2011-12-19,
loc: [-0.1, 51.5] }

(the actual documents contain detailed venue and product information -
the venue information is repeated in each product document).

A facet search will give counts for all available products. However,
we need the number of distinct venues with available products,
something akin to "SELECT COUNT(DISTINCT venue_id) FROM ". It sounds like this functionality is planned but not yet

I solution might have been to embed products within a venue document:

{ venue_id: 1,
venue_tags: [tag, tag],
loc: [-0.1, 51.5],
products: [
{ product_id: 1,
product_tags: [tag, tag],
starts_at: 2011-12-6,
ends_at: 2011-12-19 },
{ product_id: 6,
product_tags: [tag, tag],
starts_at: 2011-12-7,
ends_at: 2011-12-21 } ] }

-- but we need to constrain matches to individual products, in order
to find venues that have 1 or more products available, and the 'cross
object' search match behaviour described in the nested type
documentation means this would be not give accurate results. If
product 1's tags match, an product 2's dates match - that should not
be a match as no single product is a match.

Does anyone have any suggestions how to better approach this, search
products but somehow aggregate the counts for faceted search by the


(system) #2