Creating buckets within the index - is it facted search or new sub indexes?

alichi · December 6, 2011, 10:57pm

Hi gurus,
I have created an index and a document type with the following mapping:

{
"product_data":{
"source":{
"enabled":true
},
"properties":{
"category":{
"type":"String",
"index":"analyzed",
"analyzer":"keyword",
},
"store":{
"type":"String",
"index":"analyzed",
"analyzer":"keyword",
},
.....
.....
.....
}
}
}

Let's say I have one million product documents in this index under
"product_data" document type. My requirement is:
1- Narrow down the products into buckets of "category and store" (e.g.
"Cameras - Fry's", "Cameras - Best Buy" ...)
2- Run queries on top of the buckets (The queries will have multiple fields
involved!, i.e. they will be complicated queries)

Well, the final requirement is to have a guaranteed response time of less
than 50 ms. So, it actually makes sense to break down the big
index-document silo into smaller buckets and then run the big query on
candidate buckets. My problem is that I do not understand how to do it
properly in elasticsearch.

If I create a query and add two facets of "category" and "store" to the
query what would happen? Would elasticsearch create buckets based on
category and store facets behind the scene and run the query on top of that?
Or should I just create smaller index-documets and populate them with
category-store permutations and then run the queries on these newly created
indices?

Thanks for your time in advance,
Ali

kimchy · December 7, 2011, 3:41pm

I did not manage to understand the question. If you use facets, you will
get counts bounded by the query you execute.

On Wed, Dec 7, 2011 at 12:57 AM, Ali Loghmani loghmani@gmail.com wrote:

Hi gurus,
I have created an index and a document type with the following mapping:

{
"product_data":{
"source":{
"enabled":true
},
"properties":{
"category":{
"type":"String",
"index":"analyzed",
"analyzer":"keyword",
},
"store":{
"type":"String",
"index":"analyzed",
"analyzer":"keyword",
},
.....
.....
.....
}
}
}

Let's say I have one million product documents in this index under
"product_data" document type. My requirement is:
1- Narrow down the products into buckets of "category and store" (e.g.
"Cameras - Fry's", "Cameras - Best Buy" ...)
2- Run queries on top of the buckets (The queries will have multiple
fields involved!, i.e. they will be complicated queries)

Well, the final requirement is to have a guaranteed response time of less
than 50 ms. So, it actually makes sense to break down the big
index-document silo into smaller buckets and then run the big query on
candidate buckets. My problem is that I do not understand how to do it
properly in elasticsearch.

If I create a query and add two facets of "category" and "store" to the
query what would happen? Would elasticsearch create buckets based on
category and store facets behind the scene and run the query on top of that?
Or should I just create smaller index-documets and populate them with
category-store permutations and then run the queries on these newly created
indices?

Thanks for your time in advance,
Ali

alichi · December 7, 2011, 6:21pm

Ok, let me change the question. I am not interested in getting counts of
matching documents. In other words, I do not need to use faceted search to
know about number of documents under each "category-store" combination.

I want to implement a map-reduce like functionality to have buckets of
"category-store" already populated with product documents and cached if you
like (each bucket would have an order of thousand docs not million), and
then run the final query (querying other fields like title, description,
brand ....) over one or a handful of buckets.

I want to use ES features for this requirement. I already have a backup
solution. My backup solution is to query ES for "category-state"
combinations and store the result into Mongo, then I have to run my queries
against mongo. This way I would have a guaranteed response time of less
than 50ms. I just do not want to bring in Mongo into the picture.

Thanks,
Ali

alichi · December 8, 2011, 5:18pm

Anybody? somebody? I guess the answer more than obvious that is why I am
not getting any! So, please correct me if I am wrong! Do I need to query
the repository for "store-products" (Please read the first post above to
understand this combo) combinations, get the result for each set and create
a new index for each and store the fetched result into new crated smaller
index. This way I could route the secondary queries to these new created
indices. Or , should I trust on ES and let it the job it has been designed
to do?

Thanks,
Ali

alichi · December 9, 2011, 8:01pm

I just came up with a cool idea. Since this is a two legged process I simply should index the documents with an ID representing each combo. This way I could retrieve the docs pertinent to each store-category really fast. In the second leg I simply can provide my runtime queries along with the ID created. The catch is that I should interpret run time user provided info to an ID. In my case it is not that hard. So, thinking loud kinda helped me!

kimchy · December 9, 2011, 9:58pm

Heya,

It might make sense to simply use the same index to do it. Why do you
need to break it into bucket "indices" (stored in ES or mongo, does not
matter)? You can have the "bucket" query provided as a filter, which is
cached, and then, execute the query "within" the bucket filtered by the
mentioned filter. This might be the simplest option.

-shay.banon

On Fri, Dec 9, 2011 at 10:01 PM, Ali Loghmani loghmani@gmail.com wrote:

I just came up with a cool idea. Since this is a two legged process I
simply should index the documents with an ID representing each combo. This
way I could retrieve the docs pertinent to each store-category really fast.
In the second leg I simply can provide my runtime queries along with the ID
created. The catch is that I should interpret run time user provided info
to an ID. In my case it is not that hard. So, thinking loud kinda helped me!

alichi · December 9, 2011, 10:48pm

Awesome, I was not sure about filter caching, thanks a million shay!