Term query with search result


I know ES is not a kind of RDB and no rich support for Joins available.

But let me ask you this can be done with ES for check.

- index1 is index
- user is type,
- _id is auto generated
- 6 documents indexed.

{"user_no":1, "country":"us"}
{"user_no":2, "country":"jp"}
{"user_no":3, "country":"gb"}
{"user_no":4, "country":"in"}
{"user_no":5, "country":"id"}
{"user_no":6, "country":"us"}

- index2 is index
- activity is type
- _id is autogenerated
- 6 documents indexed.

{"user_no":1, "point":10}
{"user_no":2, "point":20}
{"user_no":3, "point":13}
{"user_no":4, "point":23}
{"user_no":5, "point":44}
{"user_no":6, "point":19}

With data above, I want to do something like below in RDB.

  sum(point), count(*) 
  user_no in (
      select user_no from index1.user where country='us'

I checked out https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl-terms-query.html#query-dsl-terms-lookup
But It seems with 'Terms-loopk up', all terms(which will be used in IN cluase) should be exist in one document.

No way in ES?


There are four imperfect ways that reflect the Elasticsearch's distributed system-ness:

  1. Do it yourself in your application. This might be the worst way.
  2. Parent/child. This works by routing all the document to the same shard that their parent will be on. Then you can join around parent and child. This is fairly niche but fairly useful. Not useful for when parents have lots of children though. I don't have a definition for "lots" sorry.
  3. nested. This is like automatic denormalization but it still has all the performance characteristics of denormalization.
  4. Manually denormalize. Make it look like:
{"point": 10, "user": {"number": 1, "county": "us"}}

None of these are great and, sadly, if you want to use parent/child you really need to experiment at the scale you expect. Here is a handy link about it.

Or the best way, depending, I guess.

@nik9000 Got it. Thanks!