I have many categories of products. Each category has its own unique
fields.
For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.
So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?
Less of a wasted space, but, a field does come with a memory overhead.
On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:
I have many categories of products. Each category has its own unique
fields.
For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.
So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?
Less of a wasted space, but, a field does come with a memory overhead.
On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:
I have many categories of products. Each category has its own unique
fields.
For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.
So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?
You don't have to store the docs with separate fields. You can store it
as json and data would be searchable.
You can use different "types" in a single index. The docs at each type
can have different fields.
In short, depending on number of documents you have, you may be able to
store all your data in a single index with multiple types.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
Less of a wasted space, but, a field does come with a memory overhead.
On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:
I have many categories of products. Each category has its own unique
fields.
For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.
So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?
You may be aware of this, but since it wasn't obvious to me when I first
started out...
You can use different "types" in a single index. The docs at each
type can have different fields.
Just be aware that when using a single index the fields across "types"
are not 100% independent of each other. By this I mean that if two different
"types"
have a field with the same name, the mapping type (number, boolean, etc)
of BOTH fields MUST be the same in both types.
On Tue, Jun 7, 2011 at 1:05 PM, Berkay Mollamustafaoglu mberkay@gmail.comwrote:
Couple of things if I understood correctly.
You don't have to store the docs with separate fields. You can store it
as json and data would be searchable.
You can use different "types" in a single index. The docs at each type
can have different fields.
In short, depending on number of documents you have, you may be able to
store all your data in a single index with multiple types.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
Less of a wasted space, but, a field does come with a memory overhead.
On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:
I have many categories of products. Each category has its own unique
fields.
For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.
So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?
You don't have to store the docs with separate fields. You can store it
as json and data would be searchable.
You can use different "types" in a single index. The docs at each type
can have different fields.
In short, depending on number of documents you have, you may be able to
store all your data in a single index with multiple types.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
Less of a wasted space, but, a field does come with a memory overhead.
On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:
I have many categories of products. Each category has its own unique
fields.
For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.
So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?
No, you will need to store them on different fields. The memory associated with each field index is an in memory data structure stored in the search engine to make it searchable.
On Tuesday, June 7, 2011 at 10:29 PM, Andy wrote:
If I store the data as json without separate fields, will I still be
able to facet on the fields?
For example, when a user is searching on "apparel", I need to be able
to facet on fields like "size" and "color".
You don't have to store the docs with separate fields. You can store it
as json and data would be searchable.
You can use different "types" in a single index. The docs at each type
can have different fields.
In short, depending on number of documents you have, you may be able to
store all your data in a single index with multiple types.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
Less of a wasted space, but, a field does come with a memory overhead.
On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:
I have many categories of products. Each category has its own unique
fields.
For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.
So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.