Grouping based on multiple properties

Dennis_Jakobsen · December 2, 2016, 2:01am

I have an index of movies. The movies are sourced from different web sites and there are overlaps - for example Superman III is both sources from imdb and hulu. For each movie I have the title, and the name of the site where it came from. Also some of them have director and list of actors. The titles may vary a little depending on what sources it came from.

I would like to group the movies together so one group contains all the instances of the same movie - for example one group could be Superman III form imdb, hulu and Netflix.

Is that a good use case for the graph, and how would you go about doing that?

Mark_Harwood · December 2, 2016, 2:56pm

This is really a question about data preparation (de-duplication, entity resolution).

Once the data is normalised and linked, Graph should be good for exploring the connections but data-prep is often a big part of processing most real-world data sources.

There are various techniques you can use to link data. Normalization is a big one - e.g. do you turn Superman III into Superman 3 using rules to remove Roman numerals at the end of film titles? Do you remove accents from certain characters?
Do you combine information e.g. film title and year to ensure you get the right Cape Fear?
Do you combine actor-name and movie to avoid one James Stewart being linked with a different James Stewart? Much of this data-dependent so without knowing more about the data in question it is hard to prescribe an answer that is guaranteed to work.

Dennis_Jakobsen · December 2, 2016, 4:48pm

Thank you Mark. This is helpful. I was exploring the possibilities of using elastic search to figure the linking out as it know title, director, actors it would be able to do a fuzzy match based on its statical models.

I will try to create a new text field and dump movie name, director and the list of actors and link on that.

system · December 30, 2016, 4:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graph query across multiple documents Kibana elastic-stack-graph	9	2790	July 6, 2017
Kibana - aggregation in graphs (average number of things per bucket) Kibana	8	896	September 13, 2022
Grouping values? Kibana	15	61156	March 23, 2018
Doing visualization from multiple documents Kibana	2	493	January 30, 2018
Merge Documents based on field value? Elasticsearch	2	6102	September 25, 2017

Grouping based on multiple properties

Related topics