Emulating join-like behavior in es?


(Hari Shankar) #1

Hi,

Let's say I have 2 tables in my relational DB, one called customers, which
contains customer data and one called metrics, which contains customer
metrics against customer id and date. For most of my queries, I will need to
join the two tables on the customer id. How can this behavior be emulated
efficiently in es?

My SQL queries are usually like

"SELECT customers.customerid, SUM(metrics.logins), SUM(something else), ...
FROM metrics JOIN customers ON metrics.customerid=customers.customerid
WHERE metrics.date BETWEEN date1 AND date2
AND customers.isActive=1
GROUP BY customers.customerid
ORDER BY SUM(metrics.logins) DESC"

We could do this in the client side, but that would be very inefficient,
right? Can I use the _parent mapping and has_child query to do this? Does it
return both the relevant parent and child docs as a single merged document?
I could store both of them together in a de-normalized way, but that looks
ugly. Any other ideas?

Thanks,
Hari


(Karussell) #2

Take a look into facets:

http://www.elasticsearch.org/guide/reference/api/search/facets/

http://www.elasticsearch.org/guide/reference/api/search/facets/statistical-facet.html

On Jun 25, 4:04 pm, Hari Shankar shaan.h...@gmail.com wrote:

Hi,

Let's say I have 2 tables in my relational DB, one called customers, which
contains customer data and one called metrics, which contains customer
metrics against customer id and date. For most of my queries, I will need to
join the two tables on the customer id. How can this behavior be emulated
efficiently in es?

My SQL queries are usually like

"SELECT customers.customerid, SUM(metrics.logins), SUM(something else), ...
FROM metrics JOIN customers ON metrics.customerid=customers.customerid
WHERE metrics.date BETWEEN date1 AND date2
AND customers.isActive=1
GROUP BY customers.customerid
ORDER BY SUM(metrics.logins) DESC"

We could do this in the client side, but that would be very inefficient,
right? Can I use the _parent mapping and has_child query to do this? Does it
return both the relevant parent and child docs as a single merged document?
I could store both of them together in a de-normalized way, but that looks
ugly. Any other ideas?

Thanks,
Hari


(Hari Shankar) #3

Hi,

I understand facets will be used for the SUM(..) part of the query. But how
do I get data from both the customers and metrics table at once? We could
implement it client-side, but on server-side would be nicer, right?
Basically I need to show customer name (which comes from the customers
index) and customer logins (which is in metrics index, against customer id)
side by side.

Thanks,
Hari

On Mon, Jun 27, 2011 at 7:03 PM, Karussell tableyourtime@googlemail.comwrote:

Take a look into facets:

http://www.elasticsearch.org/guide/reference/api/search/facets/

http://www.elasticsearch.org/guide/reference/api/search/facets/statistical-facet.html

On Jun 25, 4:04 pm, Hari Shankar shaan.h...@gmail.com wrote:

Hi,

Let's say I have 2 tables in my relational DB, one called customers,
which
contains customer data and one called metrics, which contains customer
metrics against customer id and date. For most of my queries, I will need
to
join the two tables on the customer id. How can this behavior be emulated
efficiently in es?

My SQL queries are usually like

"SELECT customers.customerid, SUM(metrics.logins), SUM(something else),
...
FROM metrics JOIN customers ON metrics.customerid=customers.customerid
WHERE metrics.date BETWEEN date1 AND date2
AND customers.isActive=1
GROUP BY customers.customerid
ORDER BY SUM(metrics.logins) DESC"

We could do this in the client side, but that would be very inefficient,
right? Can I use the _parent mapping and has_child query to do this? Does
it
return both the relevant parent and child docs as a single merged
document?
I could store both of them together in a de-normalized way, but that
looks
ugly. Any other ideas?

Thanks,
Hari


(system) #4