Efficient way to query Elasticsearch

Below is the short version of the dataset I am working on.

ID				TYPE				DATA			SERIES_ID	    VALIDITY 
MOVIE_1			MOVIE			MOVIE_DATA_1		NULL			2000-2050
MOVIE_2			MOVIE			MOVIE_DATA_2		NULL			2000-2050
SERIES_1		TV_SERIES		SERIES_DATA_1		NULL			2000-2050
SERIES_2		TV_SERIES		SERIES_DATA_2		NULL			2000-2050
SERIES_3		TV_SERIES		SERIES_DATA_3		NULL			2000-2050
EPISODE_1		EPISODE			EPISODE_DATA_1		SERIES_1		2000-2050
EPISODE_2		EPISODE			EPISODE_DATA_2		SERIES_2		2000-2010 

I want to write a query which returns all the documents of TYPE = (MOVIE, TV_SERIES) that are valid (meaning the present year (2020) is between the years specified in VALIDITY). However, while selecting TV_SERIES it should also check if that TV_SERIES has an EPISODE that is valid.

So for the above dataset the query should return

ID          TYPE        DATA            SERIES_ID     VALIDITY 
MOVIE_1		MOVIE		MOVIE_DATA_1	NULL	      2000-2050    (since it is valid)
MOVIE_2		MOVIE		MOVIE_DATA_2	NULL	      2000-2050    (since it is valid)
SERIES_1	TV_SERIES	SERIES_DATA_1	NULL	      2000-2050    (since it has EPISODE_1 which is valid)

SERIES_2 SHOULD NOT be returned since its episode (EPISODE_2) is not valid for present year

SERIES_3 SHOULD NOT be returned since it does not have any episodes.

This is merely a question around data modeling, and thus has the capacity of dozens of different answers.

Episodes being children of series is something that requires modeling, either using the join datatype or the nested datatype - with the latter you store all the episodes within the series document.

This way you could search for movies with validity OR series with a valid episode using a nested query.

Validity could be modeled using a range datatype.

hope that helps to get started.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.