Is it OK to have a large number of sparsely populated fields?


(Andy-2) #1

I have many categories of products. Each category has its own unique
fields.

For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.

So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?

Thanks


(Shay Banon) #2

Less of a wasted space, but, a field does come with a memory overhead.

On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:

I have many categories of products. Each category has its own unique
fields.

For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.

So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?

Thanks


(Andy-2) #3

Can you explain a bit on where the memory overhead comes from?

What kind of index structure would you recommend in this case?

Thanks.

On Jun 7, 1:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Less of a wasted space, but, a field does come with a memory overhead.

On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:

I have many categories of products. Each category has its own unique
fields.

For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.

So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?

Thanks


(Berkay Mollamustafaoglu-2) #4

Couple of things if I understood correctly.

  1. You don't have to store the docs with separate fields. You can store it
    as json and data would be searchable.
  2. You can use different "types" in a single index. The docs at each type
    can have different fields.
    In short, depending on number of documents you have, you may be able to
    store all your data in a single index with multiple types.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Tue, Jun 7, 2011 at 1:58 PM, Andy selforganized@gmail.com wrote:

Can you explain a bit on where the memory overhead comes from?

What kind of index structure would you recommend in this case?

Thanks.

On Jun 7, 1:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Less of a wasted space, but, a field does come with a memory overhead.

On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:

I have many categories of products. Each category has its own unique
fields.

For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.

So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?

Thanks


(jacorob) #5

You may be aware of this, but since it wasn't obvious to me when I first
started out...

  1. You can use different "types" in a single index. The docs at each
    type can have different fields.

Just be aware that when using a single index the fields across "types"
are not 100% independent of each other. By this I mean that if two different
"types"
have a field with the same name, the mapping type (number, boolean, etc)
of BOTH fields MUST be the same in both types.

Or as Shay explained in more detail (and probably more concisely) in a forum
post:
http://elasticsearch-users.115913.n3.nabble.com/Searching-across-types-tp1745420p1749010.html

Bob

On Tue, Jun 7, 2011 at 1:05 PM, Berkay Mollamustafaoglu
mberkay@gmail.comwrote:

Couple of things if I understood correctly.

  1. You don't have to store the docs with separate fields. You can store it
    as json and data would be searchable.
  2. You can use different "types" in a single index. The docs at each type
    can have different fields.
    In short, depending on number of documents you have, you may be able to
    store all your data in a single index with multiple types.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Tue, Jun 7, 2011 at 1:58 PM, Andy selforganized@gmail.com wrote:

Can you explain a bit on where the memory overhead comes from?

What kind of index structure would you recommend in this case?

Thanks.

On Jun 7, 1:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Less of a wasted space, but, a field does come with a memory overhead.

On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:

I have many categories of products. Each category has its own unique
fields.

For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.

So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?

Thanks


(Andy-2) #6

If I store the data as json without separate fields, will I still be
able to facet on the fields?

For example, when a user is searching on "apparel", I need to be able
to facet on fields like "size" and "color".

Thanks.

On Jun 7, 2:05 pm, Berkay Mollamustafaoglu mber...@gmail.com wrote:

Couple of things if I understood correctly.

  1. You don't have to store the docs with separate fields. You can store it
    as json and data would be searchable.
  2. You can use different "types" in a single index. The docs at each type
    can have different fields.
    In short, depending on number of documents you have, you may be able to
    store all your data in a single index with multiple types.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Tue, Jun 7, 2011 at 1:58 PM, Andy selforgani...@gmail.com wrote:

Can you explain a bit on where the memory overhead comes from?

What kind of index structure would you recommend in this case?

Thanks.

On Jun 7, 1:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Less of a wasted space, but, a field does come with a memory overhead.

On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:

I have many categories of products. Each category has its own unique
fields.

For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.

So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?

Thanks


(Shay Banon) #7

No, you will need to store them on different fields. The memory associated with each field index is an in memory data structure stored in the search engine to make it searchable.

On Tuesday, June 7, 2011 at 10:29 PM, Andy wrote:

If I store the data as json without separate fields, will I still be
able to facet on the fields?

For example, when a user is searching on "apparel", I need to be able
to facet on fields like "size" and "color".

Thanks.

On Jun 7, 2:05 pm, Berkay Mollamustafaoglu <mber...@gmail.com (http://gmail.com)> wrote:

Couple of things if I understood correctly.

  1. You don't have to store the docs with separate fields. You can store it
    as json and data would be searchable.
  2. You can use different "types" in a single index. The docs at each type
    can have different fields.
    In short, depending on number of documents you have, you may be able to
    store all your data in a single index with multiple types.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Tue, Jun 7, 2011 at 1:58 PM, Andy <selforgani...@gmail.com (http://gmail.com)> wrote:

Can you explain a bit on where the memory overhead comes from?

What kind of index structure would you recommend in this case?

Thanks.

On Jun 7, 1:18 am, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Less of a wasted space, but, a field does come with a memory overhead.

On Tuesday, June 7, 2011 at 6:18 AM, Andy wrote:

I have many categories of products. Each category has its own unique
fields.

For example, category "Apparel" has fields "size", "color", "style".
Category "laptop" has fields "processor", "RAM", "screen size". etc
etc.

So if I use 1 index for all the products, there will be many
(potentially tens of thousands) fields. Each field will only be used
by a small portion of products. Is this "1 index" approach a good
design? Will all those empty fields lead to wasted space?

Thanks


(system) #8