Field names with the same name across types having different index/type in Elasticsearch

I have detailed my question on stackoverflow here:

Briefing it here as well :

I have been reading a lot on mappings in Elasticsearch and here's something
interesting that I found

Field names with the same name across types are highly recommended to have
the same type and same mapping characteristics (analysis settings for
example). There is an effort to allow to explicitly "choose" which field to
use by using type prefix (my_type.my_field), but it’s not complete, and there
are places where it will never work (like faceting on the field).

I found the above quote from the documentation here
http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Now my use case is exactly that .. Here's an example. Suppose that some
field in tenant1 has to have the following mapping (for a given entity
user):

{
"tenantId1_user": {
"properties": {
"someField": {
"type": "string",
"index":"analyzed"
}
}
}
}

Now, for the same field in a different tenant (for the same entity type,
lets say user), the type has to change like this:

{
"tenantId2_user": {
"properties": {
"someField": {
"type": "int",
"index":"analyzed"
}
}
}
}

Now from what I understand from the above quote, it means that technically
even though I can provide this mapping, it is not recommended because deep
down Lucene handles them in the same way.

My questions are:

  1. How can I handle my usecase ? Should I just separate out each tenant in
    a different index so I don't have to worry about this mapping ?

  2. Is there any other workaround ? Considering the fact that if I have too
    many tenants that means I will have too many indices?

  3. What's the recommended way for this usecase?

All help appreciated!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0264dafc-82e9-44fb-8193-b2661e8225a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You are right, I suggest to use different indices for tenant 1 and 2, this
is also good for separating other concerns (like index term statistics,
scoring, field faceting, deleting docs, etc.)

As a matter of fact it is not Lucene that stands in the way. Internally, ES
keeps a hash map of field names across types, i.e. correct field name
lookup is a challenge if a field name denotes two different field
specifications in an index.

Jörg

On Fri, Mar 13, 2015 at 9:47 PM, shahshi15@gmail.com wrote:

I have detailed my question on stackoverflow here:

lucene - Field names with the same name across types having different index/type in Elasticsearch - Stack Overflow

Briefing it here as well :

I have been reading a lot on mappings in Elasticsearch and here's
something interesting that I found

Field names with the same name across types are highly recommended to have
the same type and same mapping characteristics (analysis settings for
example). There is an effort to allow to explicitly "choose" which field to
use by using type prefix (my_type.my_field), but it’s not complete, and there
are places where it will never work (like faceting on the field).

I found the above quote from the documentation here
http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Now my use case is exactly that .. Here's an example. Suppose that some
field in tenant1 has to have the following mapping (for a given entity
user):

{
"tenantId1_user": {
"properties": {
"someField": {
"type": "string",
"index":"analyzed"
}
}
}
}

Now, for the same field in a different tenant (for the same entity type,
lets say user), the type has to change like this:

{
"tenantId2_user": {
"properties": {
"someField": {
"type": "int",
"index":"analyzed"
}
}
}
}

Now from what I understand from the above quote, it means that technically
even though I can provide this mapping, it is not recommended because deep
down Lucene handles them in the same way.

My questions are:

  1. How can I handle my usecase ? Should I just separate out each tenant in
    a different index so I don't have to worry about this mapping ?

  2. Is there any other workaround ? Considering the fact that if I have too
    many tenants that means I will have too many indices?

  3. What's the recommended way for this usecase?

All help appreciated!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0264dafc-82e9-44fb-8193-b2661e8225a6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0264dafc-82e9-44fb-8193-b2661e8225a6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFg0Pe7wkeJPmVCRuPS0Pjvch59RVv5NVoDH5aheF7D%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Wouldn't that be a bit too much though ? I mean if we have thousands of
customers (tenants) we will have to create index for each of them ?
Wouldn't it affect performance and wouldn't maintaining those many indexes
in the cluster a bit too much ?

On Saturday, March 14, 2015 at 10:48:35 AM UTC-7, Jörg Prante wrote:

You are right, I suggest to use different indices for tenant 1 and 2, this
is also good for separating other concerns (like index term statistics,
scoring, field faceting, deleting docs, etc.)

As a matter of fact it is not Lucene that stands in the way. Internally,
ES keeps a hash map of field names across types, i.e. correct field name
lookup is a challenge if a field name denotes two different field
specifications in an index.

Jörg

On Fri, Mar 13, 2015 at 9:47 PM, <shah...@gmail.com <javascript:>> wrote:

I have detailed my question on stackoverflow here:

lucene - Field names with the same name across types having different index/type in Elasticsearch - Stack Overflow

Briefing it here as well :

I have been reading a lot on mappings in Elasticsearch and here's
something interesting that I found

Field names with the same name across types are highly recommended to have
the same type and same mapping characteristics (analysis settings for
example). There is an effort to allow to explicitly "choose" which field to
use by using type prefix (my_type.my_field), but it’s not complete, and there
are places where it will never work (like faceting on the field).

I found the above quote from the documentation here
http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Now my use case is exactly that .. Here's an example. Suppose that some
field in tenant1 has to have the following mapping (for a given entity
user):

{
"tenantId1_user": {
"properties": {
"someField": {
"type": "string",
"index":"analyzed"
}
}
}
}

Now, for the same field in a different tenant (for the same entity type,
lets say user), the type has to change like this:

{
"tenantId2_user": {
"properties": {
"someField": {
"type": "int",
"index":"analyzed"
}
}
}
}

Now from what I understand from the above quote, it means that
technically even though I can provide this mapping, it is not recommended
because deep down Lucene handles them in the same way.

My questions are:

  1. How can I handle my usecase ? Should I just separate out each tenant
    in a different index so I don't have to worry about this mapping ?

  2. Is there any other workaround ? Considering the fact that if I have
    too many tenants that means I will have too many indices?

  3. What's the recommended way for this usecase?

All help appreciated!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0264dafc-82e9-44fb-8193-b2661e8225a6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0264dafc-82e9-44fb-8193-b2661e8225a6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d41800b1-22ab-46ec-b4ee-a85ff298d50c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

If you have thousands of tenants with thousands of potentially overlapping
mappings that should operate independently, the hardware sizing of a
cluster is a challenge, yes.

OTOH you can play tricks at your search/index front end API if you can hide
ES internals from the customers, e.g. prefixing field names with the tenant
ID so field names become unique. This should not be a recommended method,
though - because ES should be able to handle overlapping mappings in a more
feasible way.

Jörg

On Sat, Mar 14, 2015 at 7:38 PM, shahshi15@gmail.com wrote:

Wouldn't that be a bit too much though ? I mean if we have thousands of
customers (tenants) we will have to create index for each of them ?
Wouldn't it affect performance and wouldn't maintaining those many indexes
in the cluster a bit too much ?

On Saturday, March 14, 2015 at 10:48:35 AM UTC-7, Jörg Prante wrote:

You are right, I suggest to use different indices for tenant 1 and 2,
this is also good for separating other concerns (like index term
statistics, scoring, field faceting, deleting docs, etc.)

As a matter of fact it is not Lucene that stands in the way. Internally,
ES keeps a hash map of field names across types, i.e. correct field name
lookup is a challenge if a field name denotes two different field
specifications in an index.

Jörg

On Fri, Mar 13, 2015 at 9:47 PM, shah...@gmail.com wrote:

I have detailed my question on stackoverflow here:
lucene - Field names with the same name across types having different index/type in Elasticsearch - Stack Overflow
names-with-the-same-name-across-types-having-different-
index-type-in-elast

Briefing it here as well :

I have been reading a lot on mappings in Elasticsearch and here's
something interesting that I found

Field names with the same name across types are highly recommended to have
the same type and same mapping characteristics (analysis settings for
example). There is an effort to allow to explicitly "choose" which field to
use by using type prefix (my_type.my_field), but it’s not complete, and there
are places where it will never work (like faceting on the field).

I found the above quote from the documentation here
http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Now my use case is exactly that .. Here's an example. Suppose that some
field in tenant1 has to have the following mapping (for a given entity
user):

{
"tenantId1_user": {
"properties": {
"someField": {
"type": "string",
"index":"analyzed"
}
}
}
}

Now, for the same field in a different tenant (for the same entity type,
lets say user), the type has to change like this:

{
"tenantId2_user": {
"properties": {
"someField": {
"type": "int",
"index":"analyzed"
}
}
}
}

Now from what I understand from the above quote, it means that
technically even though I can provide this mapping, it is not recommended
because deep down Lucene handles them in the same way.

My questions are:

  1. How can I handle my usecase ? Should I just separate out each tenant
    in a different index so I don't have to worry about this mapping ?

  2. Is there any other workaround ? Considering the fact that if I have
    too many tenants that means I will have too many indices?

  3. What's the recommended way for this usecase?

All help appreciated!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0264dafc-82e9-44fb-8193-b2661e8225a6%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0264dafc-82e9-44fb-8193-b2661e8225a6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d41800b1-22ab-46ec-b4ee-a85ff298d50c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d41800b1-22ab-46ec-b4ee-a85ff298d50c%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHj%2B%2BMmPQ1KUHYUTifek48rOCRM2TLLsvijpGP5k56SPg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.