Handling data between multiple indices

I have 2 indices. These 2 indices are kind of related to one another.

For example, lets say, the 1st index contains all information pertaining to an e-book. Information like author, published date, title etc. will be indexed here.

And the 2nd index contains all paragraphs within a book. Information like book id, para content, page number, complex object information etc. will be indexed here.

When I want to query for paragraphs from the 2nd index based on the 1st index information like book title or published data, how do I do that?

  1. Is it advisable to store all the meta information of the 1st index inside the 2nd index to apply filters and query its documents. In this way I'll be needlessly bloating up the 2nd index with duplicate information which I already have in the 1st index.
  2. Is there a way I can form a relationship between these indices?
  3. Is it possible to maintain a single index for my case? Like storing all the paragraph related information in the 1st index itself as a list of objects. In this case, every document in the 1st index will be huge (lets say a list 10000 paragraphs indexed or more) and will it be efficient while performing the querying operation?

Or is there any other way I can solve this?

Any help, much appreciated.

Yes, this is a common way to solve this problem. Storing the same data across multiple documents will take up more space, but maybe not as much as you think. It is worth testing as it often greatly simplifies querying.

No. Elasticsearch does not support joins.

I am not sure I understand this. Could you please elaborate and provide an example?

To add more info on my last question,
The current structure I have:
Parent index:
{
id: string,
title: string,
author: string,
.... (20 more fields including that of object types)
}
Child index:
{
id: string,
content: string,
bookId: string (id from the parent index),
bookTitle: string (title from the parent index),
.... (20+ more fields including that of object types)
}

Proposed structure:
Index:
{
Id: string,
title: string,
author: string,
...,
Paragraphs: List (Sometimes, there could be a list of 10000 or more children here)
}

By combining these 2 indices, I could potentially be storing lots of data for each document in the index. Is that fine or will I have performance issues while
a) querying,
b) applying text/keyword search on both the parent and child objects

Another option to look into could be to store the different documents in the same index and use the parent-child feature. This adds a bit complexity and is a bit slower but reduces the amount of duplication.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.