Say I have a store and each product I sell has a document in Elasticsearch:
{
id: "abc",
name: "mittens",
price: "$10.00"
},
{
id: "def",
name: "hat",
price: "$13.00"
}
My store's website allows users to log in and review products. However, at this store, the reviews shown are on a user-by-user basis rather than aggregated. Meaning, if Harry gives mittens
a 4-star review, and Hermione gives them a 2-star review, I want Harry to see 4 stars and Hermione to see 2.
In order to achieve this, I need to somewhere store each individual review for every user. I'm wondering what the most performant and efficient way is to store this data.
Option 1: I can store it on the product document itself, like so:
{
id: "abc",
name: "mittens",
price: "$10.00",
ratings: [
{ user: "Harry", stars: 4 },
{ user: "Hermione", stars: 2 },
...
]
}
This makes retrieving and displaying the data really easy, but I wonder how reasonable this model is if I had users in the thousands.
Option 2: use separate indices in a relational-like model. The product index would have documents with product data only, no ratings. The second index would be just for users' ratings and would have documents like so:
{
user: "Harry",
ratings: [
{ id: "abc", stars: 4 },
...
]
},
{
user: "Hermione",
ratings: [
{ id: "abc", stars: 2 },
{ id: "def", stars: 3 },
...
]
}
This option keeps the individual documents in each index relatively small, but it complicates querying and displaying data. I'm also concerned that using this relational-paradigm is "wrong" for something like Elasticsearch.
So at a high level, I'm wondering which option is the most performant and scalable. Or if there is perhaps an even better option that I haven't thought of. Thanks!