Recommended maximum fields per index


(maho) #1

Hi,

is there a maximum number of fields per index you should not exceed
because performance issues?

In my case I have 10 types per index and 150 fields per type -
summarized 1500 fields per index.

And secondly, is it possible to define type independed fields that
apply all types to reduce the amount of fields per indes?

Thanks.


(Lukáš Vlček) #2

Hi,

As for the second question you can simply use index templates:
http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html

I am not sure about the impact of high number of document fields but I think
that 1500 fields per index should ok (it probably gets down to the content
of your data so you should try it and measure). Generally, Lucene is
designed to allows a unique set of fields per document in one index.

Regards,
Lukas

On Thu, May 12, 2011 at 9:22 AM, maho mathias.hodler@gmail.com wrote:

Hi,

is there a maximum number of fields per index you should not exceed
because performance issues?

In my case I have 10 types per index and 150 fields per type -
summarized 1500 fields per index.

And secondly, is it possible to define type independed fields that
apply all types to reduce the amount of fields per indes?

Thanks.


(Shay Banon) #3

In elasticsearch, a field named "x" in different types is considered the same field in Lucene. The number of fields affects the memory usage (for search) mainly. The option to store _source means that you don't have to store each one and get all at once.
On Thursday, May 12, 2011 at 11:06 AM, Lukáš Vlček wrote:

Hi,

As for the second question you can simply use index templates: http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html

I am not sure about the impact of high number of document fields but I think that 1500 fields per index should ok (it probably gets down to the content of your data so you should try it and measure). Generally, Lucene is designed to allows a unique set of fields per document in one index.

Regards,
Lukas

On Thu, May 12, 2011 at 9:22 AM, maho mathias.hodler@gmail.com wrote:

Hi,

is there a maximum number of fields per index you should not exceed
because performance issues?

In my case I have 10 types per index and 150 fields per type -
summarized 1500 fields per index.

And secondly, is it possible to define type independed fields that
apply all types to reduce the amount of fields per indes?

Thanks.


(maho) #4

Hi Shay,

thats interesting... so you should not define fields with the same
name but different data types in different index types?

On 12 Mai, 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:

In elasticsearch, a field named "x" in different types is considered the same field in Lucene. The number of fields affects the memory usage (for search) mainly. The option to store _source means that you don't have to store each one and get all at once.

On Thursday, May 12, 2011 at 11:06 AM, Lukáš Vlček wrote:

Hi,

As for the second question you can simply use index templates:http://www.elasticsearch.org/guide/reference/api/admin-indices-templa...

I am not sure about the impact of high number of document fields but I think that 1500 fields per index should ok (it probably gets down to the content of your data so you should try it and measure). Generally, Lucene is designed to allows a unique set of fields per document in one index.

Regards,
Lukas

On Thu, May 12, 2011 at 9:22 AM, maho mathias.hod...@gmail.com wrote:

Hi,

is there a maximum number of fields per index you should not exceed
because performance issues?

In my case I have 10 types per index and 150 fields per type -
summarized 1500 fields per index.

And secondly, is it possible to define type independed fields that
apply all types to reduce the amount of fields per indes?

Thanks.


(Shay Banon) #5

Yes, that can lead to strange behavior, unless you make special care (in some features that support it) to use type_name.field_name. In general, its not recommended.
On Thursday, May 12, 2011 at 12:24 PM, maho wrote:

Hi Shay,

thats interesting... so you should not define fields with the same
name but different data types in different index types?

On 12 Mai, 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:

In elasticsearch, a field named "x" in different types is considered the same field in Lucene. The number of fields affects the memory usage (for search) mainly. The option to store _source means that you don't have to store each one and get all at once.

On Thursday, May 12, 2011 at 11:06 AM, Lukáš Vlček wrote:

Hi,

As for the second question you can simply use index templates:http://www.elasticsearch.org/guide/reference/api/admin-indices-templa...

I am not sure about the impact of high number of document fields but I think that 1500 fields per index should ok (it probably gets down to the content of your data so you should try it and measure). Generally, Lucene is designed to allows a unique set of fields per document in one index.

Regards,
Lukas

On Thu, May 12, 2011 at 9:22 AM, maho mathias.hod...@gmail.com wrote:

Hi,

is there a maximum number of fields per index you should not exceed
because performance issues?

In my case I have 10 types per index and 150 fields per type -
summarized 1500 fields per index.

And secondly, is it possible to define type independed fields that
apply all types to reduce the amount of fields per indes?

Thanks.


(plaflamme) #6

Hi Shay,

Does this appear in the documentation? I looked quickly and didn't find
anything that warned users from doing this. This may seem obvious to a
developer but I don't think it is for users. Types "seem" to segment the
index into independent portions, but in fact they are very closely related
(much more so than it appears).

Should I open an issue for documenting this? I'd be glad to contribute some
documentation also.

Philippe

On Thu, May 12, 2011 at 05:26, Shay Banon shay.banon@elasticsearch.comwrote:

Yes, that can lead to strange behavior, unless you make special care (in
some features that support it) to use type_name.field_name. In general, its
not recommended.

On Thursday, May 12, 2011 at 12:24 PM, maho wrote:

Hi Shay,

thats interesting... so you should not define fields with the same
name but different data types in different index types?

On 12 Mai, 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:

In elasticsearch, a field named "x" in different types is considered the
same field in Lucene. The number of fields affects the memory usage (for
search) mainly. The option to store _source means that you don't have to
store each one and get all at once.

On Thursday, May 12, 2011 at 11:06 AM, Lukáš Vlček wrote:

Hi,

As for the second question you can simply use index templates:
http://www.elasticsearch.org/guide/reference/api/admin-indices-templa...

I am not sure about the impact of high number of document fields but I
think that 1500 fields per index should ok (it probably gets down to the
content of your data so you should try it and measure). Generally, Lucene is
designed to allows a unique set of fields per document in one index.

Regards,
Lukas

On Thu, May 12, 2011 at 9:22 AM, maho mathias.hod...@gmail.com wrote:

Hi,

is there a maximum number of fields per index you should not exceed
because performance issues?

In my case I have 10 types per index and 150 fields per type -
summarized 1500 fields per index.

And secondly, is it possible to define type independed fields that
apply all types to reduce the amount of fields per indes?

Thanks.


(Lukáš Vlček) #7

Please go ahead Phillipe and open ticket, if you want to contribute some
content that would be warmly welcome!
Lukas

On Thu, May 12, 2011 at 2:57 PM, Philippe Laflamme <
philippe.laflamme@obiba.org> wrote:

Hi Shay,

Does this appear in the documentation? I looked quickly and didn't find
anything that warned users from doing this. This may seem obvious to a
developer but I don't think it is for users. Types "seem" to segment the
index into independent portions, but in fact they are very closely related
(much more so than it appears).

Should I open an issue for documenting this? I'd be glad to contribute some
documentation also.

Philippe

On Thu, May 12, 2011 at 05:26, Shay Banon shay.banon@elasticsearch.comwrote:

Yes, that can lead to strange behavior, unless you make special care (in
some features that support it) to use type_name.field_name. In general, its
not recommended.

On Thursday, May 12, 2011 at 12:24 PM, maho wrote:

Hi Shay,

thats interesting... so you should not define fields with the same
name but different data types in different index types?

On 12 Mai, 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:

In elasticsearch, a field named "x" in different types is considered the
same field in Lucene. The number of fields affects the memory usage (for
search) mainly. The option to store _source means that you don't have to
store each one and get all at once.

On Thursday, May 12, 2011 at 11:06 AM, Lukáš Vlček wrote:

Hi,

As for the second question you can simply use index templates:
http://www.elasticsearch.org/guide/reference/api/admin-indices-templa...

I am not sure about the impact of high number of document fields but I
think that 1500 fields per index should ok (it probably gets down to the
content of your data so you should try it and measure). Generally, Lucene is
designed to allows a unique set of fields per document in one index.

Regards,
Lukas

On Thu, May 12, 2011 at 9:22 AM, maho mathias.hod...@gmail.com wrote:

Hi,

is there a maximum number of fields per index you should not exceed
because performance issues?

In my case I have 10 types per index and 150 fields per type -
summarized 1500 fields per index.

And secondly, is it possible to define type independed fields that
apply all types to reduce the amount of fields per indes?

Thanks.


(plaflamme) #8

Done: https://github.com/elasticsearch/elasticsearch/issues/927

https://github.com/elasticsearch/elasticsearch/issues/927I'd be happy to
contribute the documentation for this, but I'd need to know what ES expects.
Shay mentioned that they should have the same type. Is this the only
recommendation/requirement? Should the mapping be identical or only
compatible?

Philippe

On Thu, May 12, 2011 at 09:07, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Please go ahead Phillipe and open ticket, if you want to contribute some
content that would be warmly welcome!
Lukas

On Thu, May 12, 2011 at 2:57 PM, Philippe Laflamme <
philippe.laflamme@obiba.org> wrote:

Hi Shay,

Does this appear in the documentation? I looked quickly and didn't find
anything that warned users from doing this. This may seem obvious to a
developer but I don't think it is for users. Types "seem" to segment the
index into independent portions, but in fact they are very closely related
(much more so than it appears).

Should I open an issue for documenting this? I'd be glad to contribute
some documentation also.

Philippe

On Thu, May 12, 2011 at 05:26, Shay Banon shay.banon@elasticsearch.comwrote:

Yes, that can lead to strange behavior, unless you make special care
(in some features that support it) to use type_name.field_name. In general,
its not recommended.

On Thursday, May 12, 2011 at 12:24 PM, maho wrote:

Hi Shay,

thats interesting... so you should not define fields with the same
name but different data types in different index types?

On 12 Mai, 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:

In elasticsearch, a field named "x" in different types is considered the
same field in Lucene. The number of fields affects the memory usage (for
search) mainly. The option to store _source means that you don't have to
store each one and get all at once.

On Thursday, May 12, 2011 at 11:06 AM, Lukáš Vlček wrote:

Hi,

As for the second question you can simply use index templates:
http://www.elasticsearch.org/guide/reference/api/admin-indices-templa...

I am not sure about the impact of high number of document fields but I
think that 1500 fields per index should ok (it probably gets down to the
content of your data so you should try it and measure). Generally, Lucene is
designed to allows a unique set of fields per document in one index.

Regards,
Lukas

On Thu, May 12, 2011 at 9:22 AM, maho mathias.hod...@gmail.com wrote:

Hi,

is there a maximum number of fields per index you should not exceed
because performance issues?

In my case I have 10 types per index and 150 fields per type -
summarized 1500 fields per index.

And secondly, is it possible to define type independed fields that
apply all types to reduce the amount of fields per indes?

Thanks.


(Lukáš Vlček) #9

I haven't tried this myself but I would say that the fields must be of the
same type.
Couple of relevant email threads.
http://elasticsearch-users.115913.n3.nabble.com/Searching-across-types-tp1745420p1745420.html
http://elasticsearch-users.115913.n3.nabble.com/Mapping-namespace-problem-td2494252.html

On Thu, May 12, 2011 at 3:17 PM, Philippe Laflamme <
philippe.laflamme@obiba.org> wrote:

Done: https://github.com/elasticsearch/elasticsearch/issues/927

https://github.com/elasticsearch/elasticsearch/issues/927I'd be happy
to contribute the documentation for this, but I'd need to know what ES
expects. Shay mentioned that they should have the same type. Is this the
only recommendation/requirement? Should the mapping be identical or only
compatible?

Philippe

On Thu, May 12, 2011 at 09:07, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Please go ahead Phillipe and open ticket, if you want to contribute some
content that would be warmly welcome!
Lukas

On Thu, May 12, 2011 at 2:57 PM, Philippe Laflamme <
philippe.laflamme@obiba.org> wrote:

Hi Shay,

Does this appear in the documentation? I looked quickly and didn't find
anything that warned users from doing this. This may seem obvious to a
developer but I don't think it is for users. Types "seem" to segment the
index into independent portions, but in fact they are very closely related
(much more so than it appears).

Should I open an issue for documenting this? I'd be glad to contribute
some documentation also.

Philippe

On Thu, May 12, 2011 at 05:26, Shay Banon shay.banon@elasticsearch.comwrote:

Yes, that can lead to strange behavior, unless you make special care
(in some features that support it) to use type_name.field_name. In general,
its not recommended.

On Thursday, May 12, 2011 at 12:24 PM, maho wrote:

Hi Shay,

thats interesting... so you should not define fields with the same
name but different data types in different index types?

On 12 Mai, 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:

In elasticsearch, a field named "x" in different types is considered the
same field in Lucene. The number of fields affects the memory usage (for
search) mainly. The option to store _source means that you don't have to
store each one and get all at once.

On Thursday, May 12, 2011 at 11:06 AM, Lukáš Vlček wrote:

Hi,

As for the second question you can simply use index templates:
http://www.elasticsearch.org/guide/reference/api/admin-indices-templa
...

I am not sure about the impact of high number of document fields but I
think that 1500 fields per index should ok (it probably gets down to the
content of your data so you should try it and measure). Generally, Lucene is
designed to allows a unique set of fields per document in one index.

Regards,
Lukas

On Thu, May 12, 2011 at 9:22 AM, maho mathias.hod...@gmail.com wrote:

Hi,

is there a maximum number of fields per index you should not exceed
because performance issues?

In my case I have 10 types per index and 150 fields per type -
summarized 1500 fields per index.

And secondly, is it possible to define type independed fields that
apply all types to reduce the amount of fields per indes?

Thanks.


(Shay Banon) #10

The mappings are recommended to be the same. Its kindda in the middle now, I have started (way back) to try and support it better when explicitly specifying the type in relevant queries filters, but it does not work when they have different analyzers. And, in any case, you would need to explicitly specify the type, for example:

{ "term" : { "my_type.my_field" : "value" }}.

It does not work though when you execute a search on that type alone (/my_index/my_type/_search), i.e., the typeness is not bubbled down.

Those are the two main issues that I can think that I are missing. But, even with that, things like facets will become problematic (since there can be only one "type" for that field when faceting on it). So, in any case, even with those issues "fixed", its recommended that they will have the same mapping (analyzer, type, and so on).
On Thursday, May 12, 2011 at 4:17 PM, Philippe Laflamme wrote:

Done: https://github.com/elasticsearch/elasticsearch/issues/927

I'd be happy to contribute the documentation for this, but I'd need to know what ES expects. Shay mentioned that they should have the same type. Is this the only recommendation/requirement? Should the mapping be identical or only compatible?

Philippe

On Thu, May 12, 2011 at 09:07, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Please go ahead Phillipe and open ticket, if you want to contribute some content that would be warmly welcome!
Lukas

On Thu, May 12, 2011 at 2:57 PM, Philippe Laflamme philippe.laflamme@obiba.org wrote:

Hi Shay,

Does this appear in the documentation? I looked quickly and didn't find anything that warned users from doing this. This may seem obvious to a developer but I don't think it is for users. Types "seem" to segment the index into independent portions, but in fact they are very closely related (much more so than it appears).

Should I open an issue for documenting this? I'd be glad to contribute some documentation also.

Philippe

On Thu, May 12, 2011 at 05:26, Shay Banon shay.banon@elasticsearch.com wrote:

Yes, that can lead to strange behavior, unless you make special care (in some features that support it) to use type_name.field_name. In general, its not recommended.
On Thursday, May 12, 2011 at 12:24 PM, maho wrote:

Hi Shay,

thats interesting... so you should not define fields with the same
name but different data types in different index types?

On 12 Mai, 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:

In elasticsearch, a field named "x" in different types is considered the same field in Lucene. The number of fields affects the memory usage (for search) mainly. The option to store _source means that you don't have to store each one and get all at once.

On Thursday, May 12, 2011 at 11:06 AM, Lukáš Vlček wrote:

Hi,

As for the second question you can simply use index templates:http://www.elasticsearch.org/guide/reference/api/admin-indices-templa...

I am not sure about the impact of high number of document fields but I think that 1500 fields per index should ok (it probably gets down to the content of your data so you should try it and measure). Generally, Lucene is designed to allows a unique set of fields per document in one index.

Regards,
Lukas

On Thu, May 12, 2011 at 9:22 AM, maho mathias.hod...@gmail.com wrote:

Hi,

is there a maximum number of fields per index you should not exceed
because performance issues?

In my case I have 10 types per index and 150 fields per type -
summarized 1500 fields per index.

And secondly, is it possible to define type independed fields that
apply all types to reduce the amount of fields per indes?

Thanks.


(system) #11