Scalability questions

Hello

We are using some other search engine and consider moving to use
Elasticsearch. After done quite a lot reading, I am still not quite sure
what the optimized way should be in our case, especially after I read that
the number of shards can NOT be changed once the index is created.

In our situation, our product is hosted in cloud environment and has rapid
growing number of users, and each user is given various disk space(several
gigabytes to hundreds gigabytes) to import their datasets. We index these
datasets with fixed number of fields and the fields are all the same for
some purpose. Each user can only search in their own imported datasets for
security reason (segregated). So there is no need to query against the
entire index and query time is much more important than indexing time. Our
current query time is about 10 to 40 ms.

It's very crucial for us how to scale out horizontally smoothly.

If everything is added into one index with one type, I worried the
index/search will be getting slower and slower with growing of the size of
the indices.

So I plan to split the indices to speed up query, and here are some options

  1. Use one index and create a type for each user such that the query
    from one user is directly against his own type. But since the number of
    users can be over million, can elasticsearch be able to handle million
    types in one index?
  2. Group users into different indices such that the index/query can be
    dispatched to different indices, so a smaller index to query from. But
    this means our application has to handle the complexity of horizontal scale
    out.

Is any option doable? Any option would you recommend?

Besides, could you please tell me how many shards one index should have in
best practice? Does too many shards also have performance hit?

Many thanks,

Cindy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a2adcd16-1c7b-4e78-a131-d9ae4d61379b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I think I would start reading this: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/kagillion-shards.html http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/kagillion-shards.html
This http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/user-based.html http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/user-based.html
and this http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/faking-it.html http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/faking-it.html

Actually the full chapter: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scale.html http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scale.html :slight_smile:

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 14 janv. 2015 à 02:04, 'Cindy' via elasticsearch elasticsearch@googlegroups.com a écrit :

Hello

We are using some other search engine and consider moving to use Elasticsearch. After done quite a lot reading, I am still not quite sure what the optimized way should be in our case, especially after I read that the number of shards can NOT be changed once the index is created.

In our situation, our product is hosted in cloud environment and has rapid growing number of users, and each user is given various disk space(several gigabytes to hundreds gigabytes) to import their datasets. We index these datasets with fixed number of fields and the fields are all the same for some purpose. Each user can only search in their own imported datasets for security reason (segregated). So there is no need to query against the entire index and query time is much more important than indexing time. Our current query time is about 10 to 40 ms.

It's very crucial for us how to scale out horizontally smoothly.

If everything is added into one index with one type, I worried the index/search will be getting slower and slower with growing of the size of the indices.

So I plan to split the indices to speed up query, and here are some options
Use one index and create a type for each user such that the query from one user is directly against his own type. But since the number of users can be over million, can elasticsearch be able to handle million types in one index?
Group users into different indices such that the index/query can be dispatched to different indices, so a smaller index to query from. But this means our application has to handle the complexity of horizontal scale out.

Is any option doable? Any option would you recommend?

Besides, could you please tell me how many shards one index should have in best practice? Does too many shards also have performance hit?

Many thanks,
Cindy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a2adcd16-1c7b-4e78-a131-d9ae4d61379b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a2adcd16-1c7b-4e78-a131-d9ae4d61379b%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/28CA132C-0158-430C-97D4-F3E5BBEE162D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Hi David,

The documentations you pointed out are exactly what I am looking for. They
are really helpful and demonstrate the uniqueness of Elasticsearch on
scalability :slight_smile:

I like the tips in "faking index per user with aliases" very much, but
since it basically routes the request to a single shard, I just want to
double check with you whether multiple users can share the same shard.

Thanks,
Cindy

On Wednesday, 14 January 2015 06:23:07 UTC-5, David Pilato wrote:

I think I would start reading this:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/kagillion-shards.html
This
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/user-based.html
and this
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/faking-it.html

Actually the full chapter:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scale.html
:slight_smile:

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com

@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 14 janv. 2015 à 02:04, 'Cindy' via elasticsearch <
elasti...@googlegroups.com <javascript:>> a écrit :

Hello

We are using some other search engine and consider moving to use
Elasticsearch. After done quite a lot reading, I am still not quite sure
what the optimized way should be in our case, especially after I read that
the number of shards can NOT be changed once the index is created.

In our situation, our product is hosted in cloud environment and has rapid
growing number of users, and each user is given various disk space(several
gigabytes to hundreds gigabytes) to import their datasets. We index these
datasets with fixed number of fields and the fields are all the same for
some purpose. Each user can only search in their own imported datasets for
security reason (segregated). So there is no need to query against the
entire index and query time is much more important than indexing time. Our
current query time is about 10 to 40 ms.

It's very crucial for us how to scale out horizontally smoothly.

If everything is added into one index with one type, I worried the
index/search will be getting slower and slower with growing of the size of
the indices.

So I plan to split the indices to speed up query, and here are some options

  1. Use one index and create a type for each user such that the query
    from one user is directly against his own type. But since the number of
    users can be over million, can elasticsearch be able to handle million
    types in one index?
  2. Group users into different indices such that the index/query can be
    dispatched to different indices, so a smaller index to query from. But
    this means our application has to handle the complexity of horizontal scale
    out.

Is any option doable? Any option would you recommend?

Besides, could you please tell me how many shards one index should have in
best practice? Does too many shards also have performance hit?

Many thanks,
Cindy

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a2adcd16-1c7b-4e78-a131-d9ae4d61379b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a2adcd16-1c7b-4e78-a131-d9ae4d61379b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b8909ec7-efb2-41d6-adc6-d5b33dddc7c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yes. Many users will share the same shard.

David

Le 14 janv. 2015 à 21:14, 'Cindy' via elasticsearch elasticsearch@googlegroups.com a écrit :

Hi David,

The documentations you pointed out are exactly what I am looking for. They are really helpful and demonstrate the uniqueness of Elasticsearch on scalability :slight_smile:

I like the tips in "faking index per user with aliases" very much, but since it basically routes the request to a single shard, I just want to double check with you whether multiple users can share the same shard.

Thanks,
Cindy

On Wednesday, 14 January 2015 06:23:07 UTC-5, David Pilato wrote:
I think I would start reading this: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/kagillion-shards.html
This http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/user-based.html
and this http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/faking-it.html

Actually the full chapter: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scale.html :slight_smile:

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 14 janv. 2015 à 02:04, 'Cindy' via elasticsearch elasti...@googlegroups.com a écrit :

Hello

We are using some other search engine and consider moving to use Elasticsearch. After done quite a lot reading, I am still not quite sure what the optimized way should be in our case, especially after I read that the number of shards can NOT be changed once the index is created.

In our situation, our product is hosted in cloud environment and has rapid growing number of users, and each user is given various disk space(several gigabytes to hundreds gigabytes) to import their datasets. We index these datasets with fixed number of fields and the fields are all the same for some purpose. Each user can only search in their own imported datasets for security reason (segregated). So there is no need to query against the entire index and query time is much more important than indexing time. Our current query time is about 10 to 40 ms.

It's very crucial for us how to scale out horizontally smoothly.

If everything is added into one index with one type, I worried the index/search will be getting slower and slower with growing of the size of the indices.

So I plan to split the indices to speed up query, and here are some options
Use one index and create a type for each user such that the query from one user is directly against his own type. But since the number of users can be over million, can elasticsearch be able to handle million types in one index?
Group users into different indices such that the index/query can be dispatched to different indices, so a smaller index to query from. But this means our application has to handle the complexity of horizontal scale out.

Is any option doable? Any option would you recommend?

Besides, could you please tell me how many shards one index should have in best practice? Does too many shards also have performance hit?

Many thanks,
Cindy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a2adcd16-1c7b-4e78-a131-d9ae4d61379b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b8909ec7-efb2-41d6-adc6-d5b33dddc7c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2B7A370D-75D9-4DA1-9C3C-830624FBB420%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

The shard identification/routing is completely arbitrary. For instance,
users who's usernames start from A-F can be routed to shard 1, G-M to shard
2, etc. So you can imagine, user Ed, Cindy and user David data can live in
shard 1. Use Greg will have his data in shard 2.

On Wednesday, January 14, 2015 at 12:14:50 PM UTC-8, Cindy wrote:

Hi David,

The documentations you pointed out are exactly what I am looking for. They
are really helpful and demonstrate the uniqueness of Elasticsearch on
scalability :slight_smile:

I like the tips in "faking index per user with aliases" very much, but
since it basically routes the request to a single shard, I just want to
double check with you whether multiple users can share the same shard.

Thanks,
Cindy

On Wednesday, 14 January 2015 06:23:07 UTC-5, David Pilato wrote:

I think I would start reading this:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/kagillion-shards.html
This
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/user-based.html
and this
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/faking-it.html

Actually the full chapter:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scale.html
:slight_smile:

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com

@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 14 janv. 2015 à 02:04, 'Cindy' via elasticsearch <
elasti...@googlegroups.com> a écrit :

Hello

We are using some other search engine and consider moving to use
Elasticsearch. After done quite a lot reading, I am still not quite sure
what the optimized way should be in our case, especially after I read that
the number of shards can NOT be changed once the index is created.

In our situation, our product is hosted in cloud environment and has
rapid growing number of users, and each user is given various disk
space(several gigabytes to hundreds gigabytes) to import their datasets. We
index these datasets with fixed number of fields and the fields are all the
same for some purpose. Each user can only search in their own imported
datasets for security reason (segregated). So there is no need to query
against the entire index and query time is much more important than
indexing time. Our current query time is about 10 to 40 ms.

It's very crucial for us how to scale out horizontally smoothly.

If everything is added into one index with one type, I worried the
index/search will be getting slower and slower with growing of the size of
the indices.

So I plan to split the indices to speed up query, and here are some
options

  1. Use one index and create a type for each user such that the query
    from one user is directly against his own type. But since the number of
    users can be over million, can elasticsearch be able to handle million
    types in one index?
  2. Group users into different indices such that the index/query can
    be dispatched to different indices, so a smaller index to query from. But
    this means our application has to handle the complexity of horizontal scale
    out.

Is any option doable? Any option would you recommend?

Besides, could you please tell me how many shards one index should have
in best practice? Does too many shards also have performance hit?

Many thanks,
Cindy

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a2adcd16-1c7b-4e78-a131-d9ae4d61379b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a2adcd16-1c7b-4e78-a131-d9ae4d61379b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a50c8871-1608-466f-86be-3619ea666704%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Could you please tell me what the Java API to use in terms of the following
REST API? I did quite a lot search but am not able to find an example how
to do it using Java API.

PUT /forums/_alias/baking
{
"routing": "baking",
"filter": {
"term": {
"forum_id": "baking"
}
}
}

Many thanks,

Cindy

On Wednesday, 14 January 2015 15:21:29 UTC-5, Ed Kim wrote:

The shard identification/routing is completely arbitrary. For instance,
users who's usernames start from A-F can be routed to shard 1, G-M to shard
2, etc. So you can imagine, user Ed, Cindy and user David data can live in
shard 1. Use Greg will have his data in shard 2.

On Wednesday, January 14, 2015 at 12:14:50 PM UTC-8, Cindy wrote:

Hi David,

The documentations you pointed out are exactly what I am looking for.
They are really helpful and demonstrate the uniqueness of Elasticsearch on
scalability :slight_smile:

I like the tips in "faking index per user with aliases" very much, but
since it basically routes the request to a single shard, I just want to
double check with you whether multiple users can share the same shard.

Thanks,
Cindy

On Wednesday, 14 January 2015 06:23:07 UTC-5, David Pilato wrote:

I think I would start reading this:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/kagillion-shards.html
This
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/user-based.html
and this
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/faking-it.html

Actually the full chapter:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scale.html
:slight_smile:

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com

@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 14 janv. 2015 à 02:04, 'Cindy' via elasticsearch <
elasti...@googlegroups.com> a écrit :

Hello

We are using some other search engine and consider moving to use
Elasticsearch. After done quite a lot reading, I am still not quite sure
what the optimized way should be in our case, especially after I read that
the number of shards can NOT be changed once the index is created.

In our situation, our product is hosted in cloud environment and has
rapid growing number of users, and each user is given various disk
space(several gigabytes to hundreds gigabytes) to import their datasets. We
index these datasets with fixed number of fields and the fields are all the
same for some purpose. Each user can only search in their own imported
datasets for security reason (segregated). So there is no need to query
against the entire index and query time is much more important than
indexing time. Our current query time is about 10 to 40 ms.

It's very crucial for us how to scale out horizontally smoothly.

If everything is added into one index with one type, I worried the
index/search will be getting slower and slower with growing of the size of
the indices.

So I plan to split the indices to speed up query, and here are some
options

  1. Use one index and create a type for each user such that the query
    from one user is directly against his own type. But since the number of
    users can be over million, can elasticsearch be able to handle million
    types in one index?
  2. Group users into different indices such that the index/query can
    be dispatched to different indices, so a smaller index to query from. But
    this means our application has to handle the complexity of horizontal scale
    out.

Is any option doable? Any option would you recommend?

Besides, could you please tell me how many shards one index should have
in best practice? Does too many shards also have performance hit?

Many thanks,
Cindy

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a2adcd16-1c7b-4e78-a131-d9ae4d61379b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a2adcd16-1c7b-4e78-a131-d9ae4d61379b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea897236-d5cd-4b7d-9292-300dec62d61a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.