Just to add some story: my company's application is an ECM (Enterprise Content Management), and, to put it very briefly, users can create:
Some types of those entities (Emails, Documents, Folders, etc) can have files, one, many, or even hundreds of files.
Other types (Clients, Projects, Contact, etc) don't have files of them own.
Those entities have fields. Many and different kind of fields: dates, numbers, small texts (255 characters, big texts (4000 characters), IDs, and so forth.
Our production relational database have a similar number of entities and files. Both in the scale of millions.
Today we use two different Lucene indexes to search those two types of objects. One is managed by Hibernate Search (Entities) and the other by Elastic Search 6.x (Files contents).
As you would expect, this causes a number of problems. To name a couple:
1- We are not able to create aggregate searches.
2- Performance is poor when searching for file contents. Because first we search the file index, and then add the results to the entities search.
After all that summary, our goal is to create a relationship using only one Elastic Search index. After a course, and some articles, I learned that there are two ways of accomplish that:
- Nested Objects
- Parent/child with join fields
The latter is, at first, the best for us. Because an entity can have many files, and it would be a burden having to index everything when a new file is added or modified. And a file's content can be very big.
My fear and doubt, reading about it, is performance.
This article above says that approach should be used with "one-to-many relationship where one entity significantly outnumbers the other entity", which is not exactly our case.
So, if possible, I would like some insight on performance issues that this could bring. If more details are needed, I would be happy to provide them.