We are using some other search engine and consider moving to use
Elasticsearch. After done quite a lot reading, I am still not quite sure
what the optimized way should be in our case, especially after I read that
the number of shards can NOT be changed once the index is created.
In our situation, our product is hosted in cloud environment and has rapid
growing number of users, and each user is given various disk space(several
gigabytes to hundreds gigabytes) to import their datasets. We index these
datasets with fixed number of fields and the fields are all the same for
some purpose. Each user can only search in their own imported datasets for
security reason (segregated). So there is no need to query against the
entire index and query time is much more important than indexing time. Our
current query time is about 10 to 40 ms.
It's very crucial for us how to scale out horizontally smoothly.
If everything is added into one index with one type, I worried the
index/search will be getting slower and slower with growing of the size of
the indices.
So I plan to split the indices to speed up query, and here are some options
Use one index and create a type for each user such that the query
from one user is directly against his own type. But since the number of
users can be over million, can elasticsearch be able to handle million
types in one index?
Group users into different indices such that the index/query can be
dispatched to different indices, so a smaller index to query from. But
this means our application has to handle the complexity of horizontal scale
out.
Is any option doable? Any option would you recommend?
Besides, could you please tell me how many shards one index should have in
best practice? Does too many shards also have performance hit?
We are using some other search engine and consider moving to use Elasticsearch. After done quite a lot reading, I am still not quite sure what the optimized way should be in our case, especially after I read that the number of shards can NOT be changed once the index is created.
In our situation, our product is hosted in cloud environment and has rapid growing number of users, and each user is given various disk space(several gigabytes to hundreds gigabytes) to import their datasets. We index these datasets with fixed number of fields and the fields are all the same for some purpose. Each user can only search in their own imported datasets for security reason (segregated). So there is no need to query against the entire index and query time is much more important than indexing time. Our current query time is about 10 to 40 ms.
It's very crucial for us how to scale out horizontally smoothly.
If everything is added into one index with one type, I worried the index/search will be getting slower and slower with growing of the size of the indices.
So I plan to split the indices to speed up query, and here are some options
Use one index and create a type for each user such that the query from one user is directly against his own type. But since the number of users can be over million, can elasticsearch be able to handle million types in one index?
Group users into different indices such that the index/query can be dispatched to different indices, so a smaller index to query from. But this means our application has to handle the complexity of horizontal scale out.
Is any option doable? Any option would you recommend?
Besides, could you please tell me how many shards one index should have in best practice? Does too many shards also have performance hit?
The documentations you pointed out are exactly what I am looking for. They
are really helpful and demonstrate the uniqueness of Elasticsearch on
scalability
I like the tips in "faking index per user with aliases" very much, but
since it basically routes the request to a single shard, I just want to
double check with you whether multiple users can share the same shard.
Thanks,
Cindy
On Wednesday, 14 January 2015 06:23:07 UTC-5, David Pilato wrote:
Le 14 janv. 2015 à 02:04, 'Cindy' via elasticsearch < elasti...@googlegroups.com <javascript:>> a écrit :
Hello
We are using some other search engine and consider moving to use
Elasticsearch. After done quite a lot reading, I am still not quite sure
what the optimized way should be in our case, especially after I read that
the number of shards can NOT be changed once the index is created.
In our situation, our product is hosted in cloud environment and has rapid
growing number of users, and each user is given various disk space(several
gigabytes to hundreds gigabytes) to import their datasets. We index these
datasets with fixed number of fields and the fields are all the same for
some purpose. Each user can only search in their own imported datasets for
security reason (segregated). So there is no need to query against the
entire index and query time is much more important than indexing time. Our
current query time is about 10 to 40 ms.
It's very crucial for us how to scale out horizontally smoothly.
If everything is added into one index with one type, I worried the
index/search will be getting slower and slower with growing of the size of
the indices.
So I plan to split the indices to speed up query, and here are some options
Use one index and create a type for each user such that the query
from one user is directly against his own type. But since the number of
users can be over million, can elasticsearch be able to handle million
types in one index?
Group users into different indices such that the index/query can be
dispatched to different indices, so a smaller index to query from. But
this means our application has to handle the complexity of horizontal scale
out.
Is any option doable? Any option would you recommend?
Besides, could you please tell me how many shards one index should have in
best practice? Does too many shards also have performance hit?
The documentations you pointed out are exactly what I am looking for. They are really helpful and demonstrate the uniqueness of Elasticsearch on scalability
I like the tips in "faking index per user with aliases" very much, but since it basically routes the request to a single shard, I just want to double check with you whether multiple users can share the same shard.
We are using some other search engine and consider moving to use Elasticsearch. After done quite a lot reading, I am still not quite sure what the optimized way should be in our case, especially after I read that the number of shards can NOT be changed once the index is created.
In our situation, our product is hosted in cloud environment and has rapid growing number of users, and each user is given various disk space(several gigabytes to hundreds gigabytes) to import their datasets. We index these datasets with fixed number of fields and the fields are all the same for some purpose. Each user can only search in their own imported datasets for security reason (segregated). So there is no need to query against the entire index and query time is much more important than indexing time. Our current query time is about 10 to 40 ms.
It's very crucial for us how to scale out horizontally smoothly.
If everything is added into one index with one type, I worried the index/search will be getting slower and slower with growing of the size of the indices.
So I plan to split the indices to speed up query, and here are some options
Use one index and create a type for each user such that the query from one user is directly against his own type. But since the number of users can be over million, can elasticsearch be able to handle million types in one index?
Group users into different indices such that the index/query can be dispatched to different indices, so a smaller index to query from. But this means our application has to handle the complexity of horizontal scale out.
Is any option doable? Any option would you recommend?
Besides, could you please tell me how many shards one index should have in best practice? Does too many shards also have performance hit?
The shard identification/routing is completely arbitrary. For instance,
users who's usernames start from A-F can be routed to shard 1, G-M to shard
2, etc. So you can imagine, user Ed, Cindy and user David data can live in
shard 1. Use Greg will have his data in shard 2.
On Wednesday, January 14, 2015 at 12:14:50 PM UTC-8, Cindy wrote:
Hi David,
The documentations you pointed out are exactly what I am looking for. They
are really helpful and demonstrate the uniqueness of Elasticsearch on
scalability
I like the tips in "faking index per user with aliases" very much, but
since it basically routes the request to a single shard, I just want to
double check with you whether multiple users can share the same shard.
Thanks,
Cindy
On Wednesday, 14 January 2015 06:23:07 UTC-5, David Pilato wrote:
We are using some other search engine and consider moving to use
Elasticsearch. After done quite a lot reading, I am still not quite sure
what the optimized way should be in our case, especially after I read that
the number of shards can NOT be changed once the index is created.
In our situation, our product is hosted in cloud environment and has
rapid growing number of users, and each user is given various disk
space(several gigabytes to hundreds gigabytes) to import their datasets. We
index these datasets with fixed number of fields and the fields are all the
same for some purpose. Each user can only search in their own imported
datasets for security reason (segregated). So there is no need to query
against the entire index and query time is much more important than
indexing time. Our current query time is about 10 to 40 ms.
It's very crucial for us how to scale out horizontally smoothly.
If everything is added into one index with one type, I worried the
index/search will be getting slower and slower with growing of the size of
the indices.
So I plan to split the indices to speed up query, and here are some
options
Use one index and create a type for each user such that the query
from one user is directly against his own type. But since the number of
users can be over million, can elasticsearch be able to handle million
types in one index?
Group users into different indices such that the index/query can
be dispatched to different indices, so a smaller index to query from. But
this means our application has to handle the complexity of horizontal scale
out.
Is any option doable? Any option would you recommend?
Besides, could you please tell me how many shards one index should have
in best practice? Does too many shards also have performance hit?
Could you please tell me what the Java API to use in terms of the following
REST API? I did quite a lot search but am not able to find an example how
to do it using Java API.
On Wednesday, 14 January 2015 15:21:29 UTC-5, Ed Kim wrote:
The shard identification/routing is completely arbitrary. For instance,
users who's usernames start from A-F can be routed to shard 1, G-M to shard
2, etc. So you can imagine, user Ed, Cindy and user David data can live in
shard 1. Use Greg will have his data in shard 2.
On Wednesday, January 14, 2015 at 12:14:50 PM UTC-8, Cindy wrote:
Hi David,
The documentations you pointed out are exactly what I am looking for.
They are really helpful and demonstrate the uniqueness of Elasticsearch on
scalability
I like the tips in "faking index per user with aliases" very much, but
since it basically routes the request to a single shard, I just want to
double check with you whether multiple users can share the same shard.
Thanks,
Cindy
On Wednesday, 14 January 2015 06:23:07 UTC-5, David Pilato wrote:
We are using some other search engine and consider moving to use
Elasticsearch. After done quite a lot reading, I am still not quite sure
what the optimized way should be in our case, especially after I read that
the number of shards can NOT be changed once the index is created.
In our situation, our product is hosted in cloud environment and has
rapid growing number of users, and each user is given various disk
space(several gigabytes to hundreds gigabytes) to import their datasets. We
index these datasets with fixed number of fields and the fields are all the
same for some purpose. Each user can only search in their own imported
datasets for security reason (segregated). So there is no need to query
against the entire index and query time is much more important than
indexing time. Our current query time is about 10 to 40 ms.
It's very crucial for us how to scale out horizontally smoothly.
If everything is added into one index with one type, I worried the
index/search will be getting slower and slower with growing of the size of
the indices.
So I plan to split the indices to speed up query, and here are some
options
Use one index and create a type for each user such that the query
from one user is directly against his own type. But since the number of
users can be over million, can elasticsearch be able to handle million
types in one index?
Group users into different indices such that the index/query can
be dispatched to different indices, so a smaller index to query from. But
this means our application has to handle the complexity of horizontal scale
out.
Is any option doable? Any option would you recommend?
Besides, could you please tell me how many shards one index should have
in best practice? Does too many shards also have performance hit?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.