Any idea about Join query

It seems hard to query with relation like SQL in elasticsearch.
I want some recommendation in my case. I have to two indices, user info and user event log.
Here is part of the data:
User Info

+-------+-----------+-----------+
| userId|   age     |   gender  |
+-------+-----------+-----------+
| userA |    19     |   male    |
+-------+-----------+-----------+
| userB |    29     |   female  |
+-------+-----------+-----------+
| userC |    39     |   female  |
+-------+-----------+-----------+

User Event log

+---------------------+---------+---------+
| timestamp           | action  | userId  |
+---------------------+---------+---------+
| 2019-10-13 08:34:03 |    A    | userA   |
+---------------------+---------+---------+
| 2019-10-13 11:39:26 |    D    | userB   |
+---------------------+---------+---------+
| 2019-10-14 11:41:23 |    A    | userA   |
+---------------------+---------+---------+
| 2019-10-14 12:10:56 |    B    | userB   |
+---------------------+---------+---------+
| 2019-10-14 12:51:42 |    B    | userA   |
+---------------------+---------+---------+
| 2019-10-14 15:23:48 |    C    | userA   |
+---------------------+---------+---------+
| 2019-10-14 16:05:19 |    B    | userB   |
+---------------------+---------+---------+
| 2019-10-14 19:44:01 |    D    | userC   |
+---------------------+---------+---------+
| 2019-10-15 03:18:15 |    D    | userA   |
+---------------------+---------+---------+

I want to find who doing action D between 2019-10-14 to 2019-10-15 (condition for event log) and gender is male (condition for user info). It should return user A in this case.

I have thought two solution:
The first solution is that get userid by conditions for event log first then using conditions for user info and userid from previous step. Second step query_string example:

gender: male AND ( userId: userB OR userId: userC OR ...)

This solution combines two query result by coding myself. I have some concerns: efficiency and too large query_string if have lot of userids from first step or just maybe get error "out of memory" in server program first.

The second solution is save the event log data in user info and using nest query, but the problem is event log size will be larger and larger. Then the user info document will reach max size limit in a document.

I would appreciate any ideas, thx.

A third option to consider is to store the user data on each event. If the user has limited data that is not frequently updated this way of denormalize game data is quite common.

1 Like

Since JOINs are not currently in our roadmap, the best solution I see is the one by @Christian_Dahlqvist : Add all the necessary user fields (gender, age) on every document of the User Event log to be able to execute the required queries in one step.

1 Like

Thx for your reply, Now I try to use the idea of @Christian_Dahlqvist , Here is my new user info data,

{
  "userId": "userA",
  "age": 19,
  "gender": "male",
  "action_A": ["2019-10-13 08:34:03","2019-10-14 11:41:23"],
  "action_B": ["2019-10-14 12:51:42"],
  "action_C": ["2019-10-14 15:23:48"]
}

Add field that the pattern like action_{event action} and store the timestamp with array of date.

Query string will be like:
gender: male AND action_A:[ 2019-10-14T00:00:00.000Z TO 2019-10-15T23:59:59.999Z ]

Hope for can help someone has the same question.

@matriv I have found your suggestion is copying the user info to event log, but it seems that will be have lots of redundant data and not easy to update user info. The advantage is a document size(event log) still be small. I think it should work that the user info is not frequently updated.

Hi @Mudboyzh,

An alternative is looping throw your events result get your user_ids to build a unique list [userA, userB, userC] and make a terms query to get all your related users.

https://www.elastic.co/guide/en/elasticsearch/reference/7.4/query-dsl-terms-query.html

Depend on your case and your data.

1 Like