Relations Between Indices


Let's say I have 2 indices: Customers Index and Products Index. Each index has it's own responsibility and unique data, however I obviously want to store the relations between the two. Meaning that I want to know which products have been bought by each customer.

Do I lose something when I keep this "relation" stored in only 1 index? I think I do (honestly not sure why), and so I don't want to store the relation in only one index but in both indices.This allows me to store this relation exactly how I want for each index (each index has it's own behavior and relevant fields concerning the relation). In addition, this will allow me to avoid joins between the indices, and making all the data needed for aggregations no matter on which index they are executed.

The only proper solution I could think of (after considering the assumptions below) is to index each relation twice, meaning that I index for each person which products he/she bought AND for each product who bought it.

My Question

In my solution, the application that duplicates the data between the indexes MUST know that the documents have been indexed successfully in BOTH of the indexes (and hopefully make a retry or write to the log). I want to know whether there is any mechanism that allows transactions in ElasticSearch. I'm afraid not, since ElasticSearch is after all a NoSQL BASE DB. If it doesn't have such a mechanism - is there a smart way to make a periodic test to find "corrupted", lack of synchronization, data? Meaning that a relation has been stored somehow in only one index and not both.

My Assumptions

  • Nested Types - One solution might be creating a product index and to store a nested array of customers in it. I believe such a solution might create limitations, since I do want sometimes to be able to query the customers directly.
  • Parent/Child - One solution might be declaring 2 types in the same index, so the customers will be the children of the products. I believe such a solution might be overkill. I'm concerned about parent/child maintenance and performance issues (shard limitations). And mostly - I don't need to create a "real" relation between the two, since just in real life the data isn't changed after it indexed. It isn't necessary to update the customer or product details after the relation has been created and this relation can't be deleted in the future. There for creating a "physical" duplication of the data is really fine by me.

What do you think? :slight_smile:
Is my solution actually reasonable after all?
Do you have a better solution?

Appreciate your time and effort for reading my problem and trying to resolve it,

I'd just have a products index, then a separate set of time based indices listing the transactions.

Then you can do "joins" in app.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.