Index Sort And Range Queries


#1

Hey there,

I have one index for all my data.
My document model is as follows:

  • The index is called Products index.
  • Each product document have nested documents (sub-documents) which contain its reviews.
  • Each product has between 0 to 100 reviews.
    Most of the product documents don't include any reviews at all, and those which include them have 20 reviews at average.
  • Reviews can be added to any product, even products created long time ago.
  • In most cases, products are reviewed next to their publish date.

Obviously one index which rules them all is bad in terms of performance and maintainability. I thought about separating my giant index into several time-based indices by using index sorting. The obvious solution for separating this index is to separate it by time. However, here's the challenge: I want to be able to find Product documents by their publish data, and Review nested documents by their review data.

The solution I thought of is this: creating new products index every month, and sorting the documents in each index by the review dates. This way, when I want to find products which was publish in certain month, I just search in the single relevant index. When I want to find reviews created in certain time, I have to search throughout all the indices smaller than the max date (for example, searching for reviews made this week demands searching all the indices) . But they are already sorted by review dates so it should be okay it terms of performance.
It looks like this:

Technical information about index sort configuration:

  • index sort field:
    Index sorting doesn't support nested types. Therefore, I will use a script that maintains the maximum review date in maxReviewDate field in the parent Product document.
  • Index sort order: desc
  • Index sort missing: first

Let's say there's a product document which contain 3 reviews at: 24/02, 28/02, 30/03. The maxReviewDate field will have the value of 30/03. If I want to find reviews that were created between 14/02 until 14/03, I need to find product doucments which their maxReviewDate is bigger than 14/02.

GET Products_January, Products_February, Products_March / _search {
	"query" : {
		"bool" : {
			"must" : [{
					"range" : {
						"maxReviewDate" : {
							"gte" : "2018-02-14 00:00:00"
						}
					}
				}, {
					"nested" : {
						"path" : "Reviews",
						"query" : {
							"range" : {
								"Reviews.ReviewDate" : {
									"gte" : "2018-02-14 00:00:00",
									"lte" : "2018-03-14 00:00:00"
								}
							}
						}
					}
				}
			]
		}
	}
}

It would be wonderful the hear what you think about this solution.
Thanks a lot.


#2

Update: unfortunately, index sort doesn't work with mapping which contains nested types.

I hope, though, that using maxReviewDate will optimize some of the performance. After all, it's a non-nested query. On the other hand, range queries aren't cached in elasticsearch.

Can somebody approve that bool query will first try filtering non-nested queries first?


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.