Hey there,
I have one index for all my data.
My document model is as follows:
- The index is called Products index.
- Each product document have nested documents (sub-documents) which contain its reviews.
- Each product has between 0 to 100 reviews.
Most of the product documents don't include any reviews at all, and those which include them have 20 reviews at average. - Reviews can be added to any product, even products created long time ago.
-
In most cases, products are reviewed next to their publish date.
Obviously one index which rules them all is bad in terms of performance and maintainability. I thought about separating my giant index into several time-based indices by using index sorting. The obvious solution for separating this index is to separate it by time. However, here's the challenge: I want to be able to find Product documents by their publish data, and Review nested documents by their review data.
The solution I thought of is this: creating new products index every month, and sorting the documents in each index by the review dates. This way, when I want to find products which was publish in certain month, I just search in the single relevant index. When I want to find reviews created in certain time, I have to search throughout all the indices smaller than the max date (for example, searching for reviews made this week demands searching all the indices) . But they are already sorted by review dates so it should be okay it terms of performance.
It looks like this:
Technical information about index sort configuration:
- index sort field:
Index sorting doesn't support nested types. Therefore, I will use a script that maintains the maximum review date inmaxReviewDate
field in the parent Product document. - Index sort order: desc
- Index sort missing: first
Let's say there's a product document which contain 3 reviews at: 24/02, 28/02, 30/03. The maxReviewDate
field will have the value of 30/03. If I want to find reviews that were created between 14/02 until 14/03, I need to find product doucments which their maxReviewDate
is bigger than 14/02.
GET Products_January, Products_February, Products_March / _search {
"query" : {
"bool" : {
"must" : [{
"range" : {
"maxReviewDate" : {
"gte" : "2018-02-14 00:00:00"
}
}
}, {
"nested" : {
"path" : "Reviews",
"query" : {
"range" : {
"Reviews.ReviewDate" : {
"gte" : "2018-02-14 00:00:00",
"lte" : "2018-03-14 00:00:00"
}
}
}
}
}
]
}
}
}
It would be wonderful the hear what you think about this solution.
Thanks a lot.