Best way to implement relationship

ViniciusSPaiva · April 28, 2020, 1:59pm

Hello there!

Just to add some story: my company's application is an ECM (Enterprise Content Management), and, to put it very briefly, users can create:

Entities
Files

Some types of those entities (Emails, Documents, Folders, etc) can have files, one, many, or even hundreds of files.
Other types (Clients, Projects, Contact, etc) don't have files of them own.

Those entities have fields. Many and different kind of fields: dates, numbers, small texts (255 characters, big texts (4000 characters), IDs, and so forth.

Our production relational database have a similar number of entities and files. Both in the scale of millions.

Today we use two different Lucene indexes to search those two types of objects. One is managed by Hibernate Search (Entities) and the other by Elastic Search 6.x (Files contents).

As you would expect, this causes a number of problems. To name a couple:
1- We are not able to create aggregate searches.
2- Performance is poor when searching for file contents. Because first we search the file index, and then add the results to the entities search.

After all that summary, our goal is to create a relationship using only one Elastic Search index. After a course, and some articles, I learned that there are two ways of accomplish that:

Nested Objects
Parent/child with join fields

The latter is, at first, the best for us. Because an entity can have many files, and it would be a burden having to index everything when a new file is added or modified. And a file's content can be very big.

My fear and doubt, reading about it, is performance.
https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html

This article above says that approach should be used with "one-to-many relationship where one entity significantly outnumbers the other entity", which is not exactly our case.

So, if possible, I would like some insight on performance issues that this could bring. If more details are needed, I would be happy to provide them.

system · May 26, 2020, 1:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Implementing a parent-child relationship and changing parent of a child document Elasticsearch	3	406	July 29, 2021
Which is the best way to index the data from relational database Elasticsearch	4	438	August 24, 2018
Basic indexing/search strategy : Nested/Parent-Child/Other Elasticsearch	1	343	April 17, 2018
Parent/child join approach? Elasticsearch	5	1371	July 5, 2017
Advice on Using Joins Elasticsearch	3	387	July 6, 2020

Best way to implement relationship

Related topics