Let's say I have one million product documents in this index under
"product_data" document type. My requirement is:
1- Narrow down the products into buckets of "category and store" (e.g.
"Cameras - Fry's", "Cameras - Best Buy" ...)
2- Run queries on top of the buckets (The queries will have multiple fields
involved!, i.e. they will be complicated queries)
Well, the final requirement is to have a guaranteed response time of less
than 50 ms. So, it actually makes sense to break down the big
index-document silo into smaller buckets and then run the big query on
candidate buckets. My problem is that I do not understand how to do it
properly in elasticsearch.
If I create a query and add two facets of "category" and "store" to the
query what would happen? Would elasticsearch create buckets based on
category and store facets behind the scene and run the query on top of that?
Or should I just create smaller index-documets and populate them with
category-store permutations and then run the queries on these newly created
indices?
Let's say I have one million product documents in this index under
"product_data" document type. My requirement is:
1- Narrow down the products into buckets of "category and store" (e.g.
"Cameras - Fry's", "Cameras - Best Buy" ...)
2- Run queries on top of the buckets (The queries will have multiple
fields involved!, i.e. they will be complicated queries)
Well, the final requirement is to have a guaranteed response time of less
than 50 ms. So, it actually makes sense to break down the big
index-document silo into smaller buckets and then run the big query on
candidate buckets. My problem is that I do not understand how to do it
properly in elasticsearch.
If I create a query and add two facets of "category" and "store" to the
query what would happen? Would elasticsearch create buckets based on
category and store facets behind the scene and run the query on top of that?
Or should I just create smaller index-documets and populate them with
category-store permutations and then run the queries on these newly created
indices?
Ok, let me change the question. I am not interested in getting counts of
matching documents. In other words, I do not need to use faceted search to
know about number of documents under each "category-store" combination.
I want to implement a map-reduce like functionality to have buckets of
"category-store" already populated with product documents and cached if you
like (each bucket would have an order of thousand docs not million), and
then run the final query (querying other fields like title, description,
brand ....) over one or a handful of buckets.
I want to use ES features for this requirement. I already have a backup
solution. My backup solution is to query ES for "category-state"
combinations and store the result into Mongo, then I have to run my queries
against mongo. This way I would have a guaranteed response time of less
than 50ms. I just do not want to bring in Mongo into the picture.
Anybody? somebody? I guess the answer more than obvious that is why I am
not getting any! So, please correct me if I am wrong! Do I need to query
the repository for "store-products" (Please read the first post above to
understand this combo) combinations, get the result for each set and create
a new index for each and store the fetched result into new crated smaller
index. This way I could route the secondary queries to these new created
indices. Or , should I trust on ES and let it the job it has been designed
to do?
I just came up with a cool idea. Since this is a two legged process I simply should index the documents with an ID representing each combo. This way I could retrieve the docs pertinent to each store-category really fast. In the second leg I simply can provide my runtime queries along with the ID created. The catch is that I should interpret run time user provided info to an ID. In my case it is not that hard. So, thinking loud kinda helped me!
It might make sense to simply use the same index to do it. Why do you
need to break it into bucket "indices" (stored in ES or mongo, does not
matter)? You can have the "bucket" query provided as a filter, which is
cached, and then, execute the query "within" the bucket filtered by the
mentioned filter. This might be the simplest option.
I just came up with a cool idea. Since this is a two legged process I
simply should index the documents with an ID representing each combo. This
way I could retrieve the docs pertinent to each store-category really fast.
In the second leg I simply can provide my runtime queries along with the ID
created. The catch is that I should interpret run time user provided info
to an ID. In my case it is not that hard. So, thinking loud kinda helped me!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.