Modeling products and category

Hi All
I am quite confused over a type mapping to fit my model.
Main type is "product". A product can be associated with multiple
categories. For simplicity say categories are not nested but a product cn
have more than 1 category.
Now I am thinking to create an index per category and store products routed
to that category name.
This way if Product#1 is associated with Category#A and Category#B then it
will indexed in 2 index (category A and B index). the doc will be exactly
same in both.
This is okay (i suppose).
But if I run any search across all indices and matching this document, then
the resultset will contain this Product#1 twice.
How can i avoid that?

( I dont want to store products in single index with category as array
field and corresponding aliases with filtered on category)
Do suggest me how this can be done or done better.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think you need to explain why you don't want to do it the simple and
straightforward way. The key to using search effectively is to denormalize
your data, not build some system that does one query well and will fall
flat outside of that small window.

You are trying to get prematurely clever. Don't use routing unless there is
a guaranteed 1 to 1 mapping between doc to routing value, and even then
only do it if it makes sense.

Best Regards,
Paul

On Monday, August 26, 2013 11:33:29 AM UTC-6, S wrote:

Hi All
I am quite confused over a type mapping to fit my model.
Main type is "product". A product can be associated with multiple
categories. For simplicity say categories are not nested but a product cn
have more than 1 category.
Now I am thinking to create an index per category and store products
routed to that category name.
This way if Product#1 is associated with Category#A and Category#B then it
will indexed in 2 index (category A and B index). the doc will be exactly
same in both.
This is okay (i suppose).
But if I run any search across all indices and matching this document,
then the resultset will contain this Product#1 twice.
How can i avoid that?

( I dont want to store products in single index with category as array
field and corresponding aliases with filtered on category)
Do suggest me how this can be done or done better.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

@Paul your reply solves no purpose. You havnt taken note of q i am asking
and just quoting your opinions re elasticsearch.
Its better to be clever in planning thn stupid in execution. If 1 query is
the most important query for which i am using elasticsearch then I will
model my data to suit that.

That said, I was experiencing data hotspots with 1 index...filtered/routed
aliases(per category) . As the guide suggests that if there are hotspots
then convert that into index, So I am trying to model each category as
index now. The issue is with this approach i have same document in multiple
index (as product can be assoc with multiple categories).
I am not sure that thats a good approach. So wud appreciate if can get some
advice

On Tuesday, August 27, 2013 9:29:56 AM UTC+5:30, ppearcy wrote:

I think you need to explain why you don't want to do it the simple and
straightforward way. The key to using search effectively is to
denormalize your data, not build some system that does one query well and
will fall flat outside of that small window.

You are trying to get prematurely clever. Don't use routing unless there
is a guaranteed 1 to 1 mapping between doc to routing value, and even then
only do it if it makes sense.

Best Regards,
Paul

On Monday, August 26, 2013 11:33:29 AM UTC-6, S wrote:

Hi All
I am quite confused over a type mapping to fit my model.
Main type is "product". A product can be associated with multiple
categories. For simplicity say categories are not nested but a product cn
have more than 1 category.
Now I am thinking to create an index per category and store products
routed to that category name.
This way if Product#1 is associated with Category#A and Category#B then
it will indexed in 2 index (category A and B index). the doc will be
exactly same in both.
This is okay (i suppose).
But if I run any search across all indices and matching this document,
then the resultset will contain this Product#1 twice.
How can i avoid that?

( I dont want to store products in single index with category as array
field and corresponding aliases with filtered on category)
Do suggest me how this can be done or done better.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Apologies, I did not mean to offend or have a meaningless reply, should
have just left it at "more info, please."

To answer this question: "But if I run any search across all indices and
matching this document, then the resultset will contain this Product#1
twice.
How can i avoid that?"

Currently, you can't avoid dups unless you request more matches than you
need and dedup client side. There is likely work in the pipeline for
grouping results which will likely allow you to collapse these down, but I
don't know when that will be available. FWIW, I think Solr has the
capability to collapse results, but not sure if it does routing.

Can you elaborate on your "data hotspots" and include as many details as
possible on your configuration (nodes, settings, mappings, data, queries
etc).

Best Regards,
Paul

On Tuesday, August 27, 2013 6:43:07 AM UTC-6, S wrote:

@Paul your reply solves no purpose. You havnt taken note of q i am asking
and just quoting your opinions re elasticsearch.
Its better to be clever in planning thn stupid in execution. If 1 query is
the most important query for which i am using elasticsearch then I will
model my data to suit that.

That said, I was experiencing data hotspots with 1 index...filtered/routed
aliases(per category) . As the guide suggests that if there are hotspots
then convert that into index, So I am trying to model each category as
index now. The issue is with this approach i have same document in multiple
index (as product can be assoc with multiple categories).
I am not sure that thats a good approach. So wud appreciate if can get
some advice

On Tuesday, August 27, 2013 9:29:56 AM UTC+5:30, ppearcy wrote:

I think you need to explain why you don't want to do it the simple and
straightforward way. The key to using search effectively is to
denormalize your data, not build some system that does one query well and
will fall flat outside of that small window.

You are trying to get prematurely clever. Don't use routing unless there
is a guaranteed 1 to 1 mapping between doc to routing value, and even then
only do it if it makes sense.

Best Regards,
Paul

On Monday, August 26, 2013 11:33:29 AM UTC-6, S wrote:

Hi All
I am quite confused over a type mapping to fit my model.
Main type is "product". A product can be associated with multiple
categories. For simplicity say categories are not nested but a product cn
have more than 1 category.
Now I am thinking to create an index per category and store products
routed to that category name.
This way if Product#1 is associated with Category#A and Category#B then
it will indexed in 2 index (category A and B index). the doc will be
exactly same in both.
This is okay (i suppose).
But if I run any search across all indices and matching this document,
then the resultset will contain this Product#1 twice.
How can i avoid that?

( I dont want to store products in single index with category as array
field and corresponding aliases with filtered on category)
Do suggest me how this can be done or done better.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Btw, if you haven't already, make sure you read this article:

A key excerpt:
When a document is indexed, you may only specify a single routing value. It
doesn’t make sense for a document to be routed to multiple shards!

On Tuesday, August 27, 2013 8:46:12 AM UTC-6, ppearcy wrote:

Apologies, I did not mean to offend or have a meaningless reply, should
have just left it at "more info, please."

To answer this question: "But if I run any search across all indices and
matching this document, then the resultset will contain this Product#1
twice.
How can i avoid that?"

Currently, you can't avoid dups unless you request more matches than you
need and dedup client side. There is likely work in the pipeline for
grouping results which will likely allow you to collapse these down, but I
don't know when that will be available. FWIW, I think Solr has the
capability to collapse results, but not sure if it does routing.

Can you elaborate on your "data hotspots" and include as many details as
possible on your configuration (nodes, settings, mappings, data, queries
etc).

Best Regards,
Paul

On Tuesday, August 27, 2013 6:43:07 AM UTC-6, S wrote:

@Paul your reply solves no purpose. You havnt taken note of q i am asking
and just quoting your opinions re elasticsearch.
Its better to be clever in planning thn stupid in execution. If 1 query
is the most important query for which i am using elasticsearch then I will
model my data to suit that.

That said, I was experiencing data hotspots with 1
index...filtered/routed aliases(per category) . As the guide suggests that
if there are hotspots then convert that into index, So I am trying to model
each category as index now. The issue is with this approach i have same
document in multiple index (as product can be assoc with multiple
categories).
I am not sure that thats a good approach. So wud appreciate if can get
some advice

On Tuesday, August 27, 2013 9:29:56 AM UTC+5:30, ppearcy wrote:

I think you need to explain why you don't want to do it the simple and
straightforward way. The key to using search effectively is to
denormalize your data, not build some system that does one query well and
will fall flat outside of that small window.

You are trying to get prematurely clever. Don't use routing unless there
is a guaranteed 1 to 1 mapping between doc to routing value, and even then
only do it if it makes sense.

Best Regards,
Paul

On Monday, August 26, 2013 11:33:29 AM UTC-6, S wrote:

Hi All
I am quite confused over a type mapping to fit my model.
Main type is "product". A product can be associated with multiple
categories. For simplicity say categories are not nested but a product cn
have more than 1 category.
Now I am thinking to create an index per category and store products
routed to that category name.
This way if Product#1 is associated with Category#A and Category#B then
it will indexed in 2 index (category A and B index). the doc will be
exactly same in both.
This is okay (i suppose).
But if I run any search across all indices and matching this document,
then the resultset will contain this Product#1 twice.
How can i avoid that?

( I dont want to store products in single index with category as array
field and corresponding aliases with filtered on category)
Do suggest me how this can be done or done better.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alright I will provide more info in a while.
But again you are digressing from the question (or atleast my confusion).
That key excerpt talks about routing. Obviously a single document cant be
routed (hence indexed) in multiple shards.
I am clear on routing and all.
I am talking abt indexing a document in 2 different index (say
index:category_A and index:category_B coz product is assoc with both
categories and i am thinking to keep all categories as sep index).

I appreciate you taking your time out.

On Tuesday, August 27, 2013 8:30:22 PM UTC+5:30, ppearcy wrote:

Btw, if you haven't already, make sure you read this article:
http://www.elasticsearch.org/blog/customizing-your-document-routing/

A key excerpt:
When a document is indexed, you may only specify a single routing value.
It doesn’t make sense for a document to be routed to multiple shards!

On Tuesday, August 27, 2013 8:46:12 AM UTC-6, ppearcy wrote:

Apologies, I did not mean to offend or have a meaningless reply, should
have just left it at "more info, please."

To answer this question: "But if I run any search across all indices and
matching this document, then the resultset will contain this Product#1
twice.
How can i avoid that?"

Currently, you can't avoid dups unless you request more matches than you
need and dedup client side. There is likely work in the pipeline for
grouping results which will likely allow you to collapse these down, but I
don't know when that will be available. FWIW, I think Solr has the
capability to collapse results, but not sure if it does routing.

Can you elaborate on your "data hotspots" and include as many details as
possible on your configuration (nodes, settings, mappings, data, queries
etc).

Best Regards,
Paul

On Tuesday, August 27, 2013 6:43:07 AM UTC-6, S wrote:

@Paul your reply solves no purpose. You havnt taken note of q i am
asking and just quoting your opinions re elasticsearch.
Its better to be clever in planning thn stupid in execution. If 1 query
is the most important query for which i am using elasticsearch then I will
model my data to suit that.

That said, I was experiencing data hotspots with 1
index...filtered/routed aliases(per category) . As the guide suggests that
if there are hotspots then convert that into index, So I am trying to model
each category as index now. The issue is with this approach i have same
document in multiple index (as product can be assoc with multiple
categories).
I am not sure that thats a good approach. So wud appreciate if can get
some advice

On Tuesday, August 27, 2013 9:29:56 AM UTC+5:30, ppearcy wrote:

I think you need to explain why you don't want to do it the simple and
straightforward way. The key to using search effectively is to
denormalize your data, not build some system that does one query well and
will fall flat outside of that small window.

You are trying to get prematurely clever. Don't use routing unless
there is a guaranteed 1 to 1 mapping between doc to routing value, and even
then only do it if it makes sense.

Best Regards,
Paul

On Monday, August 26, 2013 11:33:29 AM UTC-6, S wrote:

Hi All
I am quite confused over a type mapping to fit my model.
Main type is "product". A product can be associated with multiple
categories. For simplicity say categories are not nested but a product cn
have more than 1 category.
Now I am thinking to create an index per category and store products
routed to that category name.
This way if Product#1 is associated with Category#A and Category#B
then it will indexed in 2 index (category A and B index). the doc will be
exactly same in both.
This is okay (i suppose).
But if I run any search across all indices and matching this document,
then the resultset will contain this Product#1 twice.
How can i avoid that?

( I dont want to store products in single index with category as array
field and corresponding aliases with filtered on category)
Do suggest me how this can be done or done better.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

There is a good reason that ES doesn't support multi valued routing, it
just doesn't work without the capability to dedup docs (aka field
collapsing), which doesn't exist.

If you want to index a single doc to multiple category indexes, your query
must either:
-- Only hit a single category index at a time

  • Request a large result set in your query and dedup client side. If you do
    this, you can't paginate unless you pull all results down, which is a bad
    idea.

On Tuesday, August 27, 2013 9:06:58 AM UTC-6, S wrote:

Alright I will provide more info in a while.
But again you are digressing from the question (or atleast my confusion).
That key excerpt talks about routing. Obviously a single document cant be
routed (hence indexed) in multiple shards.
I am clear on routing and all.
I am talking abt indexing a document in 2 different index (say
index:category_A and index:category_B coz product is assoc with both
categories and i am thinking to keep all categories as sep index).

I appreciate you taking your time out.

On Tuesday, August 27, 2013 8:30:22 PM UTC+5:30, ppearcy wrote:

Btw, if you haven't already, make sure you read this article:
http://www.elasticsearch.org/blog/customizing-your-document-routing/

A key excerpt:
When a document is indexed, you may only specify a single routing value.
It doesn’t make sense for a document to be routed to multiple shards!

On Tuesday, August 27, 2013 8:46:12 AM UTC-6, ppearcy wrote:

Apologies, I did not mean to offend or have a meaningless reply, should
have just left it at "more info, please."

To answer this question: "But if I run any search across all indices
and matching this document, then the resultset will contain this Product#1
twice.
How can i avoid that?"

Currently, you can't avoid dups unless you request more matches than you
need and dedup client side. There is likely work in the pipeline for
grouping results which will likely allow you to collapse these down, but I
don't know when that will be available. FWIW, I think Solr has the
capability to collapse results, but not sure if it does routing.

Can you elaborate on your "data hotspots" and include as many details as
possible on your configuration (nodes, settings, mappings, data, queries
etc).

Best Regards,
Paul

On Tuesday, August 27, 2013 6:43:07 AM UTC-6, S wrote:

@Paul your reply solves no purpose. You havnt taken note of q i am
asking and just quoting your opinions re elasticsearch.
Its better to be clever in planning thn stupid in execution. If 1 query
is the most important query for which i am using elasticsearch then I will
model my data to suit that.

That said, I was experiencing data hotspots with 1
index...filtered/routed aliases(per category) . As the guide suggests that
if there are hotspots then convert that into index, So I am trying to model
each category as index now. The issue is with this approach i have same
document in multiple index (as product can be assoc with multiple
categories).
I am not sure that thats a good approach. So wud appreciate if can get
some advice

On Tuesday, August 27, 2013 9:29:56 AM UTC+5:30, ppearcy wrote:

I think you need to explain why you don't want to do it the simple and
straightforward way. The key to using search effectively is to
denormalize your data, not build some system that does one query well and
will fall flat outside of that small window.

You are trying to get prematurely clever. Don't use routing unless
there is a guaranteed 1 to 1 mapping between doc to routing value, and even
then only do it if it makes sense.

Best Regards,
Paul

On Monday, August 26, 2013 11:33:29 AM UTC-6, S wrote:

Hi All
I am quite confused over a type mapping to fit my model.
Main type is "product". A product can be associated with multiple
categories. For simplicity say categories are not nested but a product cn
have more than 1 category.
Now I am thinking to create an index per category and store products
routed to that category name.
This way if Product#1 is associated with Category#A and Category#B
then it will indexed in 2 index (category A and B index). the doc will be
exactly same in both.
This is okay (i suppose).
But if I run any search across all indices and matching this
document, then the resultset will contain this Product#1 twice.
How can i avoid that?

( I dont want to store products in single index with category as
array field and corresponding aliases with filtered on category)
Do suggest me how this can be done or done better.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dear original poster,
All of these problems seem to stem from your original decision of how to
store the category information for a product. In my view, based on
experience, you simply bypassed the best way to do this. When this is
pointed out you say:

( I dont want to store products in single index with category as array
field and corresponding aliases with filtered on category)

But some variant of this is exactly the correct way to do this according
to various successful indices in production today. (Don't know why you
mention aliases, though??)

So perhaps rather than focusing on why routing is broken or why multiple
indices don't work correctly in your view--perhaps start another thread on
correct design? Tell us why you cannot simply store the category
information along with the product? What kind of queries do you need to
support? Go up a level.

On Wed, Aug 28, 2013 at 10:42 AM, ppearcy ppearcy@gmail.com wrote:

There is a good reason that ES doesn't support multi valued routing, it
just doesn't work without the capability to dedup docs (aka field
collapsing), which doesn't exist.

If you want to index a single doc to multiple category indexes, your query
must either:
-- Only hit a single category index at a time

  • Request a large result set in your query and dedup client side. If you
    do this, you can't paginate unless you pull all results down, which is a
    bad idea.

On Tuesday, August 27, 2013 9:06:58 AM UTC-6, S wrote:

Alright I will provide more info in a while.
But again you are digressing from the question (or atleast my confusion).
That key excerpt talks about routing. Obviously a single document cant be
routed (hence indexed) in multiple shards.
I am clear on routing and all.
I am talking abt indexing a document in 2 different index (say
index:category_A and index:category_B coz product is assoc with both
categories and i am thinking to keep all categories as sep index).

I appreciate you taking your time out.

On Tuesday, August 27, 2013 8:30:22 PM UTC+5:30, ppearcy wrote:

Btw, if you haven't already, make sure you read this article:
http://www.elasticsearch.org/**blog/customizing-your-**document-routing/http://www.elasticsearch.org/blog/customizing-your-document-routing/

A key excerpt:
When a document is indexed, you may only specify a single routing value.
It doesn’t make sense for a document to be routed to multiple shards!

On Tuesday, August 27, 2013 8:46:12 AM UTC-6, ppearcy wrote:

Apologies, I did not mean to offend or have a meaningless reply, should
have just left it at "more info, please."

To answer this question: "But if I run any search across all indices
and matching this document, then the resultset will contain this Product#1
twice.
How can i avoid that?"

Currently, you can't avoid dups unless you request more matches than
you need and dedup client side. There is likely work in the pipeline for
grouping results which will likely allow you to collapse these down, but I
don't know when that will be available. FWIW, I think Solr has the
capability to collapse results, but not sure if it does routing.

Can you elaborate on your "data hotspots" and include as many details
as possible on your configuration (nodes, settings, mappings, data, queries
etc).

Best Regards,
Paul

On Tuesday, August 27, 2013 6:43:07 AM UTC-6, S wrote:

@Paul your reply solves no purpose. You havnt taken note of q i am
asking and just quoting your opinions re elasticsearch.
Its better to be clever in planning thn stupid in execution. If 1
query is the most important query for which i am using elasticsearch then I
will model my data to suit that.

That said, I was experiencing data hotspots with 1
index...filtered/routed aliases(per category) . As the guide suggests that
if there are hotspots then convert that into index, So I am trying to model
each category as index now. The issue is with this approach i have same
document in multiple index (as product can be assoc with multiple
categories).
I am not sure that thats a good approach. So wud appreciate if can get
some advice

On Tuesday, August 27, 2013 9:29:56 AM UTC+5:30, ppearcy wrote:

I think you need to explain why you don't want to do it the simple
and straightforward way. The key to using search effectively is to
denormalize your data, not build some system that does one query well and
will fall flat outside of that small window.

You are trying to get prematurely clever. Don't use routing unless
there is a guaranteed 1 to 1 mapping between doc to routing value, and even
then only do it if it makes sense.

Best Regards,
Paul

On Monday, August 26, 2013 11:33:29 AM UTC-6, S wrote:

Hi All
I am quite confused over a type mapping to fit my model.
Main type is "product". A product can be associated with multiple
categories. For simplicity say categories are not nested but a product cn
have more than 1 category.
Now I am thinking to create an index per category and store products
routed to that category name.
This way if Product#1 is associated with Category#A and Category#B
then it will indexed in 2 index (category A and B index). the doc will be
exactly same in both.
This is okay (i suppose).
But if I run any search across all indices and matching this
document, then the resultset will contain this Product#1 twice.
How can i avoid that?

( I dont want to store products in single index with category as
array field and corresponding aliases with filtered on category)
Do suggest me how this can be done or done better.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.