I have three tables user, product and order.
Order table has user_id and product_id in it. one order entry will have exactly one user_id and one product_id.
So how should I design the data model in ES?
Options are:
Denormalize the tables [not possible for because of huge data sets]
Nested objects [also data duplication]
Parent-child relation [Order can have only one parent ?]
Should I also use one index with three different types of documents or one separate index for each type of document ??
want to return all three entity contents for a particular order using a single query.
When thinking about ES index/mapping design, you should first remove your RDBMS hat and think differently. Terms like "denormalization" and "data duplication" should not raise red flags anymore. ES is not afraid of ingesting huge amount of data and also does not fear duplicated data. So by all means, options 1 and 2 should not be ruled out too early in the process as you would rule them out in the RDBMS world.
Here's an attempt at describing how to use all the three techniques you mention on a small example similar to yours. You might want to play around with your data and try the different techniques.
In the end, it all boils down to a few factors (unexhaustive list, but gets you started):
Are you (re-)indexing the data frequently?
If yes, how frequently?
How much data of each type do you have?
Most importantly, how do you query your data? Always design your data so that it'll be easy to query it afterwards.
Are you (re-)indexing the data frequently? YES
If yes, how frequently? 2/3 times in a day
How much data of each type do you have? Order : 1.3GB, Product: 1GB, User: 500MB
we have different types of query
Need to do full text search on Product and returning all three types.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.