Interesting question regarding "Joins"

I had this very same question with respect to user metadata. In my use-case the decoration task for logs is quite expensive despite using redis to hold all of my metadata.

In an effort to minimize the load on the decorator (since it's already experiencing load from the current system in place), I thought it would be a good idea to re-index our metadata from scratch into it's own elasticsearch index and then "somehow magically" correlate the user data to a field like userid in elasticsearch.

I'm just not sure what the "somewhat magical" component to this is. :smile:

Help would be appreciated! Please let me know if this needs anymore details. The Stackoverflow post is not mine, just referencing.

Elasticsearch is not a relational system and does not support joins between indices, so unfortunately I don't think there is any "magical" component that can solve your problem. Support for parent-child relationships exists, but this requires all documents in the relationship to be located in the same shard, which makes it quite difficult to use with time based indices.

Although elasticsearch is not an relational system, you could "cheat" and set up an apache hadoop with apache HIVE as query infrastructure and use the great elasticsearch-hadoop connector to stream your data from Elasticsearch in realtime.

In HIVE (and its metastore) you can then set up relational tables, where each table represents an elasticsearch query and than further do joins with the created tables.

There is this old pull request that might help you (if it works):

From my understanding from the Elastic team, this PR will not be merged.

Ivan