Hi
I need help to create a query that creates metrics aggregations, merge created metrics to original documents and then filters documents using these metrics.
Let's consider following dataset with managers' sales:
manager | month | day | sale | |
---|---|---|---|---|
0 | Tom | 1 | 1 | 385 |
1 | Tom | 1 | 2 | 225 |
2 | Tom | 1 | 3 | 327 |
3 | Tom | 1 | 4 | 118 |
4 | Tom | 1 | 5 | 307 |
5 | Tom | 1 | 6 | 400 |
6 | Tom | 1 | 7 | 259 |
7 | Tom | 1 | 8 | 255 |
8 | Tom | 1 | 9 | 264 |
9 | Tom | 2 | 1 | 355 |
10 | Tom | 2 | 2 | 329 |
11 | Tom | 2 | 3 | 320 |
12 | Tom | 2 | 4 | 256 |
13 | Tom | 2 | 5 | 330 |
14 | Tom | 2 | 6 | 369 |
15 | Tom | 2 | 7 | 227 |
16 | Tom | 2 | 8 | 373 |
17 | Tom | 2 | 9 | 246 |
18 | John | 1 | 1 | 304 |
19 | John | 1 | 2 | 238 |
20 | John | 1 | 3 | 313 |
21 | John | 1 | 4 | 227 |
22 | John | 1 | 5 | 302 |
23 | John | 1 | 6 | 319 |
24 | John | 1 | 7 | 209 |
25 | John | 1 | 8 | 250 |
26 | John | 1 | 9 | 311 |
27 | John | 2 | 1 | 322 |
28 | John | 2 | 2 | 188 |
29 | John | 2 | 3 | 304 |
30 | John | 2 | 4 | 349 |
31 | John | 2 | 5 | 273 |
32 | John | 2 | 6 | 487 |
33 | John | 2 | 7 | 307 |
34 | John | 2 | 8 | 229 |
35 | John | 2 | 9 | 375 |
I want to filter docs where daily sale greater than monthly avg by certain manager.
If I had used Python and Pandas I would made following:
df['avg_by_month'] = df.groupby(['manager', 'month'])['sell'].transform('mean')
New data:
manager | month | day | sale | avg_by_month | |
---|---|---|---|---|---|
0 | Tom | 1 | 1 | 385 | 282.222 |
1 | Tom | 1 | 2 | 225 | 282.222 |
2 | Tom | 1 | 3 | 327 | 282.222 |
3 | Tom | 1 | 4 | 118 | 282.222 |
4 | Tom | 1 | 5 | 307 | 282.222 |
5 | Tom | 1 | 6 | 400 | 282.222 |
6 | Tom | 1 | 7 | 259 | 282.222 |
7 | Tom | 1 | 8 | 255 | 282.222 |
8 | Tom | 1 | 9 | 264 | 282.222 |
9 | Tom | 2 | 1 | 355 | 311.667 |
10 | Tom | 2 | 2 | 329 | 311.667 |
11 | Tom | 2 | 3 | 320 | 311.667 |
12 | Tom | 2 | 4 | 256 | 311.667 |
13 | Tom | 2 | 5 | 330 | 311.667 |
14 | Tom | 2 | 6 | 369 | 311.667 |
15 | Tom | 2 | 7 | 227 | 311.667 |
16 | Tom | 2 | 8 | 373 | 311.667 |
17 | Tom | 2 | 9 | 246 | 311.667 |
18 | John | 1 | 1 | 304 | 274.778 |
19 | John | 1 | 2 | 238 | 274.778 |
20 | John | 1 | 3 | 313 | 274.778 |
21 | John | 1 | 4 | 227 | 274.778 |
22 | John | 1 | 5 | 302 | 274.778 |
23 | John | 1 | 6 | 319 | 274.778 |
24 | John | 1 | 7 | 209 | 274.778 |
25 | John | 1 | 8 | 250 | 274.778 |
26 | John | 1 | 9 | 311 | 274.778 |
27 | John | 2 | 1 | 322 | 314.889 |
28 | John | 2 | 2 | 188 | 314.889 |
29 | John | 2 | 3 | 304 | 314.889 |
30 | John | 2 | 4 | 349 | 314.889 |
31 | John | 2 | 5 | 273 | 314.889 |
32 | John | 2 | 6 | 487 | 314.889 |
33 | John | 2 | 7 | 307 | 314.889 |
34 | John | 2 | 8 | 229 | 314.889 |
35 | John | 2 | 9 | 375 | 314.889 |
And now let's filter docs:
df.loc[df.sell > df.avg_by_month]
manager | month | day | sale | avg_by_month | |
---|---|---|---|---|---|
0 | Tom | 1 | 1 | 385 | 282.222 |
2 | Tom | 1 | 3 | 327 | 282.222 |
4 | Tom | 1 | 5 | 307 | 282.222 |
5 | Tom | 1 | 6 | 400 | 282.222 |
9 | Tom | 2 | 1 | 355 | 311.667 |
10 | Tom | 2 | 2 | 329 | 311.667 |
11 | Tom | 2 | 3 | 320 | 311.667 |
13 | Tom | 2 | 5 | 330 | 311.667 |
14 | Tom | 2 | 6 | 369 | 311.667 |
16 | Tom | 2 | 8 | 373 | 311.667 |
18 | John | 1 | 1 | 304 | 274.778 |
20 | John | 1 | 3 | 313 | 274.778 |
22 | John | 1 | 5 | 302 | 274.778 |
23 | John | 1 | 6 | 319 | 274.778 |
26 | John | 1 | 9 | 311 | 274.778 |
27 | John | 2 | 1 | 322 | 314.889 |
30 | John | 2 | 4 | 349 | 314.889 |
32 | John | 2 | 6 | 487 | 314.889 |
35 | John | 2 | 9 | 375 | 314.889 |
How can I create similar query using Query DSL?
Thank you