Please share your thoughts
We have one big ES index and 18 shards (9 primary and 9 replicas)
We have thousands of customers and each customer could have millions or as
opposite very small number of documents
We never search across all customers but within a specific customer. In
other words all our queries have a customer id filter.
The big disadvantage of having one big index is we always search the data
of all customers rather than looking in one customer
Obviously it hurts our queries performance.
We're thinking to create multiple indexes : an index per customer. But in
our case it means having hundreds or maybe thousands indexes
In terms of the maintenance is a big overhead
Other approach is create many shards
Could you, please share your experience and thoughts?
What would you recommend in this scenario
Thank you in advance,
Vladi Feigin
--
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of the
addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
I had a project with the same context. We decided to increase the # of
shards as it was impossible to have one index for each customer.
Another approach is to have only some customers (hardcoded) separated from
the rest. If you can, in advance, detect this users it might be a good idea
and then have a "Rest of the world" index for non important ones.
Also when we increased the # of shards, we incremented the amount of
servers but with smaller ones, that improved a lot our failure resiliency.
Hope that helps.
On Friday, March 20, 2015 at 5:28:55 PM UTC+1, Vladi Feigin wrote:
Hello,
Please share your thoughts
We have one big ES index and 18 shards (9 primary and 9 replicas)
We have thousands of customers and each customer could have millions or as
opposite very small number of documents
We never search across all customers but within a specific customer. In
other words all our queries have a customer id filter.
The big disadvantage of having one big index is we always search the data
of all customers rather than looking in one customer
Obviously it hurts our queries performance.
We're thinking to create multiple indexes : an index per customer. But in
our case it means having hundreds or maybe thousands indexes
In terms of the maintenance is a big overhead
Other approach is create many shards
Could you, please share your experience and thoughts?
What would you recommend in this scenario
Thank you in advance,
Vladi Feigin
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of
the addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
Use routing to send each customers documents to a specific shard, you can
then query using the same routing value and reduce your exposure. Then use
aliases so you can easily move larger customers out to their own index if
need be.
I had a project with the same context. We decided to increase the # of
shards as it was impossible to have one index for each customer.
Another approach is to have only some customers (hardcoded) separated from
the rest. If you can, in advance, detect this users it might be a good idea
and then have a "Rest of the world" index for non important ones.
Also when we increased the # of shards, we incremented the amount of
servers but with smaller ones, that improved a lot our failure resiliency.
Hope that helps.
On Friday, March 20, 2015 at 5:28:55 PM UTC+1, Vladi Feigin wrote:
Hello,
Please share your thoughts
We have one big ES index and 18 shards (9 primary and 9 replicas)
We have thousands of customers and each customer could have millions or
as opposite very small number of documents
We never search across all customers but within a specific customer. In
other words all our queries have a customer id filter.
The big disadvantage of having one big index is we always search the data
of all customers rather than looking in one customer
Obviously it hurts our queries performance.
We're thinking to create multiple indexes : an index per customer. But in
our case it means having hundreds or maybe thousands indexes
In terms of the maintenance is a big overhead
Other approach is create many shards
Could you, please share your experience and thoughts?
What would you recommend in this scenario
Thank you in advance,
Vladi Feigin
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of
the addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
You could get around this by using routing based on customer ID when
indexing and searching. This will ensure that all documents belonging to a
single customer will be located in the same shard, which means that each
search for a specific customer can hit a single shard instead of all 9,
which makes it scale better.
Best regards,
Christian
On Friday, March 20, 2015 at 4:28:55 PM UTC, Vladi Feigin wrote:
Hello,
Please share your thoughts
We have one big ES index and 18 shards (9 primary and 9 replicas)
We have thousands of customers and each customer could have millions or as
opposite very small number of documents
We never search across all customers but within a specific customer. In
other words all our queries have a customer id filter.
The big disadvantage of having one big index is we always search the data
of all customers rather than looking in one customer
Obviously it hurts our queries performance.
We're thinking to create multiple indexes : an index per customer. But in
our case it means having hundreds or maybe thousands indexes
In terms of the maintenance is a big overhead
Other approach is create many shards
Could you, please share your experience and thoughts?
What would you recommend in this scenario
Thank you in advance,
Vladi Feigin
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of
the addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
Thank you Mark
Can you please elaborate regarding the routing? Are you meaning using
customer id as a routing value?
Can you give an example? Link?
Should I override the shard calculation function?
בתאריך 20 במרץ 2015 19:43, מאת "Mark Walkom" markwalkom@gmail.com:
This is where you use routing and aliases.
Use routing to send each customers documents to a specific shard, you can
then query using the same routing value and reduce your exposure. Then use
aliases so you can easily move larger customers out to their own index if
need be.
I had a project with the same context. We decided to increase the # of
shards as it was impossible to have one index for each customer.
Another approach is to have only some customers (hardcoded) separated
from the rest. If you can, in advance, detect this users it might be a good
idea and then have a "Rest of the world" index for non important ones.
Also when we increased the # of shards, we incremented the amount of
servers but with smaller ones, that improved a lot our failure resiliency.
Hope that helps.
On Friday, March 20, 2015 at 5:28:55 PM UTC+1, Vladi Feigin wrote:
Hello,
Please share your thoughts
We have one big ES index and 18 shards (9 primary and 9 replicas)
We have thousands of customers and each customer could have millions or
as opposite very small number of documents
We never search across all customers but within a specific customer. In
other words all our queries have a customer id filter.
The big disadvantage of having one big index is we always search the
data of all customers rather than looking in one customer
Obviously it hurts our queries performance.
We're thinking to create multiple indexes : an index per customer. But
in our case it means having hundreds or maybe thousands indexes
In terms of the maintenance is a big overhead
Other approach is create many shards
Could you, please share your experience and thoughts?
What would you recommend in this scenario
Thank you in advance,
Vladi Feigin
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of
the addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
--
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of the
addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
Thank you Mark
Can you please elaborate regarding the routing? Are you meaning using
customer id as a routing value?
Can you give an example? Link?
Should I override the shard calculation function?
בתאריך 20 במרץ 2015 19:43, מאת "Mark Walkom" markwalkom@gmail.com:
This is where you use routing and aliases.
Use routing to send each customers documents to a specific shard, you can
then query using the same routing value and reduce your exposure. Then use
aliases so you can easily move larger customers out to their own index if
need be.
I had a project with the same context. We decided to increase the # of
shards as it was impossible to have one index for each customer.
Another approach is to have only some customers (hardcoded) separated
from the rest. If you can, in advance, detect this users it might be a good
idea and then have a "Rest of the world" index for non important ones.
Also when we increased the # of shards, we incremented the amount of
servers but with smaller ones, that improved a lot our failure resiliency.
Hope that helps.
On Friday, March 20, 2015 at 5:28:55 PM UTC+1, Vladi Feigin wrote:
Hello,
Please share your thoughts
We have one big ES index and 18 shards (9 primary and 9 replicas)
We have thousands of customers and each customer could have millions or
as opposite very small number of documents
We never search across all customers but within a specific customer. In
other words all our queries have a customer id filter.
The big disadvantage of having one big index is we always search the
data of all customers rather than looking in one customer
Obviously it hurts our queries performance.
We're thinking to create multiple indexes : an index per customer. But
in our case it means having hundreds or maybe thousands indexes
In terms of the maintenance is a big overhead
Other approach is create many shards
Could you, please share your experience and thoughts?
What would you recommend in this scenario
Thank you in advance,
Vladi Feigin
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of
the addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of
the addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
Thank you everybody for the help!
Is there a way to run routing in a debug mode? For example calculate the
shard id via API ?
Thank you,
Vladi
On Friday, March 20, 2015 at 6:28:55 PM UTC+2, Vladi Feigin wrote:
Hello,
Please share your thoughts
We have one big ES index and 18 shards (9 primary and 9 replicas)
We have thousands of customers and each customer could have millions or as
opposite very small number of documents
We never search across all customers but within a specific customer. In
other words all our queries have a customer id filter.
The big disadvantage of having one big index is we always search the data
of all customers rather than looking in one customer
Obviously it hurts our queries performance.
We're thinking to create multiple indexes : an index per customer. But in
our case it means having hundreds or maybe thousands indexes
In terms of the maintenance is a big overhead
Other approach is create many shards
Could you, please share your experience and thoughts?
What would you recommend in this scenario
Thank you in advance,
Vladi Feigin
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of
the addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
--
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of the
addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
It's been 2 years since this discussion was active
But I want to add a link to Shay Banon lecture about that exact issue: https://vimeo.com/44716955
Could be a good use for new users who encounter the same issue.
Just looking at a few of the slides there, and the fact that it's Shay presenting, I'm going to go out on a limb here and state that while many of the principles in here remain true, some of them are no longer true, and we're giving different advice. This is mostly due to the fact that Lucene has changed much since this talk, and many of the reasons we did things the way we did were to work around things that have been updated/fixed/changed (in Lucene), and no longer apply.
My recommendation: Get up-to-date recommendations from the Elasticsearch team before latching on to potentially outdated material.
I'm sorry, didn't mean to confuse anyone. I think it might be useful to state in the video description that the video is not relevant for further versions of Elasticsearch . In addition, it would be really great to update the Elasticsearch: The Definitive Guide for such cases. Thanks for the advise!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.