I'm new to Elasticsearch and I've been reading and playing around with it a lot lately but I still can't understand one thing and am hoping to find the answer here.
From my understanding, Elasticsearch is not something for handling database relationships. It would much more like it if all of the data be in one index.
What I can't figure out is why doing so is a better use case?
Here is my example:
In typical relationship database, I can have something like this
Collection 1
{
name: 'test',
collection_2_id: '222',
collection_3_id: '333',
collection_4_ids: ['441', '442', '443']
}
Collection 2
{
id: '222',
name: 'collection 2'
}
Lets say that in Collection 1 we have 10000 documents and 5000 of them have the same Collection_2_id relationship. When I do a name change in Collection 2 for the document with ID '222', that's one query and it's a done deal.
Now on Elasticsearch this will not work and from what I've read, the solution should be to have the contents of each document instead of IDs.
The index should look something like that:
Collection 1 index
{
name: 'test',
Collection_2: { name: 'collection 2' },
Collection_3: ...
...
}
Now this is fine until someone has to change the name of collection 2 document (the one that is the same on 5000 documents). So now, instead of doing 1 query, you have to do 5000 queries to update each document in the index.
To put things into perspective, in my case, I have collections with several relationships and also arrays that include more than one relationship. If I merge all that data together there will be thousands of requests all the time when that data will be updated.
And yes, I need all of that data to be in the index, because the search should filter results based on data from these related collections.
So my question is how do you guys solve this problem? Is it really efficient to do thousands of queries to update repeating fields in each document?
Should I restructure my database to one big collection with subcollections?