Joins across heterogenous documents in an index

Newbie to ES.

I have heterogenous documents being thrown into the same index in ES . Some example documents ( 8 typical documents that would be in the index )


{"actor_id":1, "actor_name":"Adam Sandler"}
{"actor_id":2, "actor_name":"Jack Nicholson"}
{"actor_id":3, "actor_name":"Drew Barrymore"}
{"actor_id":4, "actor_name":"Jim Parsons"}
{"movie_id":1, "movie_name":"50 first dates", "actors":[1, 3]}
{"movie_id":2, "movie_name":"Anger Management", "actors":[1, 2]}
{"series_id":1, "series_name":"The Big Bang Theory", "performers":[4]}
{"movie_id":3, "movie_name":"Spoiler Alert", "actors":[4]}

What I now want to get out of this is "Give me a list of actors and their works" with the result in either of these formats

Format-1
Actor Name, Work Name
Jim Parsons, The Big Bang Theory
Jim Parsons, Spoiler Alert
etc

Format-2
Actor Name, Works
Jim Parsons, [The Big Bang Theory, Spoiler Alert]
etc

Questions

  1. I'm perhaps trying to get too much out of ES without really shaping the data properly. But the data is highly fluid and I don't even own it, the owners could change at their will the attributes except perhaps the ones that model the associations. The different types of data is a long list : e.g., int he above example it could be movies, series, documentaries, charity work etc and continuing to grow. So to shape it constantly to suit my queries is going to be a lot of ongoing stress.

  2. I'm not constrained by just a single index : I could put the actors into one, movies into one, series into one etc. But it all depends on if it's going to depend on what's more preferred with ES given my queries above - I intend using either the SQL interface to ES or the DSL - either for querying. Is it possible to query as above with different indexes ? (I notice that cross index query via ES SQL interface is possible only if schema is the same so am I making it harder for myself if I use different indexes )?

  3. What query interface in ES is best suited for this type of queries (DSL vs SQL etc )?

  4. Should I consider the has_parent approach etc instead?

  5. Finally, am I being too optimistic to be trying this without shaping the data first? If I need to format, what are your suggestions on what I should solve wrt the models?

Thank you so much for patiently reading through and look forward to seeing your comments and suggestions.

Bump.. Any help pls?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.