Finding related keys across documents (Self join in SQL)


(JPotter) #1

I have a document that has an session_id,event_type, and product_id. Something similar to

{ 'session_id': 1, event_type:'impression', product_id: 5 }
{ 'session_id': 2, event_type:'click', product_id: 1 }
{ 'session_id': 3, event_type:'impression', product_id: 4}
{ 'session_id': 1, event_type:'click', product_id: 5 }

I basically want to be able to identify if a session has had an impression and click on a product in two different documents.

In SQL I would get all of my sessions that had impressions and then I would join them on the clicks to find the intersection of the two sets

WITH impressions AS (
    SELECT DISTINCT session_id, product_id FROM data WHERE event_type = 'impression'
)

SELECT session_id, product_id
FROM data d
INNER JOIN impressions i 
    ON d.session_id = i.session_id
WHERE d.event_type = 'click'

Do I need to make two separate queries? Can I use a has_parent query?


(Mark Harwood) #2

Joins don't come cheap in distributed systems which is often the case with large, time-based indices.

If you're planning on doing a lot of session analysis consider building an entity-centric index around sessions: https://www.youtube.com/watch?v=yBf7oeJKH2Y


(system) #3