To Join or Not to Join

HI all :slight_smile:

When I was first learning Elasticsearch and designing my first index, a colleague of mine suggested not using ES parent/child joins because they were slow.

I countered by claiming that it was unlikely that Elasticsearch's implementation of join could be slower than a "client-side join." By "client-side join" I mean first doing one query and then a second query using the results from the first. By using ES join, I avoid a round trip, serdes, http request overhead, etc. I was also worried that I might lose some roaring bitset speed by going this route.

However, my data and index as time has evolved has turned out much different than I thought it would be. For one thing, we have far far more of one document kind than another, and the number of fields in the latter group is much much higher.

As a result, I'm starting to consider that the downsides of keeping both kinds of documents in the same index might outweigh the downsides of a client-side join.

Any thoughts welcome!

Without knowing more about your data and use case it is hard to have any comments.

Hi @Christian_Dahlqvist I suppose I'm only curious if it is ever a better idea to do a "client-side join" rather than letting elasticsearch do the join.