How can i search data from two indexes in Elastic search

I have been stucked in a scenario and not getting any proper solution. Here is the problem i am facing with Elasticsearch. Any help would be appriciated.

  • I have two indexes one is video and another is subtitles. A video can have multiple chunks of subtitles which i am storing in subtitle index with video_id.

  • Now when i search anything i want to search video title, description and its subtitle as well.

  • But as far as i know ES does not supports Relational mapping. So i was not able to search subtitles as i need.

  • Then i tried to store all the subtitles of a single video inside video index in form of nested array. But in long term a long video might have a lots of subtitles which will make my document more heavy and put performance impact.

So need your help in this to find a solution. Thank you.

Welcome!

In Elasticsearch, which is a NoSQL database optimized for search operations, traditional relational mapping as found in SQL databases (like MySQL or PostgreSQL) is not supported. However, Elasticsearch offers mechanisms to model relationships between data in ways that resemble relational connections. Here are the two primary approaches:

Nested Objects :

{
  "name": "Star wars",
  "subtitles": [
    {
      "text": "Who's Your Daddy?!",
      "time": ...
    },
    {
      "text": "I'm",
      "time": ...
    },
  ]
}

Parent-Child Relationships :

// Parent document
{
  "id": "1",
  "name": "Star wars"
}

// Child document
{
  "text": "Who's Your Daddy?!",
  "time": ...,
  "parent_id": "1"
}

Thank you @elvee for quick response. I am already using nested objects. Like in each document i have a field named ' substitles ' and storing all the subtitles in respect to that document in form nested object. But the point is, in future it might be too heavy with plenty of subtitles and performance might be impacted.

Regarding the Parent child relationship i think it will be a little complex to do as index load will be increased.

Is there any other solution you can suggest or is there any way i can achive my solution by using two different indexs ?

Have you tested the approach with larger documents? If not, I would recommend you do so.

It does add complexity and overhead at index and query time, but allows the data to be broken down into smaller documents.

As Elasticsearch does not support joins these are the main 2 options. Another way that has not been discussed is to denormalise your data, e.g. store subtitles or groups of subtitles in separate documents together with all video data.

1 Like

Thank you for a detailed response. Just wanted to ask 2 more thing before proceeding with this approach.

Lets say i have a parent document with 5 child documents.

  • When i search a keyword, suppose its matching the title of the parent and 2 of its child document. In this case i should get one parent document in response and 2 child documents inside that parent object with highlight. Is it possible ?

  • Certain videos have subtitles with over 1gb, hence we are looking to break them in smaller documents, rather than a nested document. So will this approach still good to go with ?

Waiting for your response thanks.

How much data do you store at the video level? If this is small compared to the amount of subtitles I would recommend looking into denormalising your data.

I do not use parent-child relationships often so am not the right person to answer this.

If the amount of subtitles is that large, parent-child or denormalisation is probably a better approach than using nested documents. Elasticsearch is not optimised for very large documents.

1 Like

Definitely my favorite and 1st approach than anything else. :wink:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.