Shards vs indexes vs cluster


(Scott Decker) #1

Hey all,
Still trying to track down our performance issues and had another
question.

we have 1 cluster, but within it we have multiple indexes for different
products.
So, we have multiple content indexes, multiple topic search indexes,
multiple other indexes, etc.

So, a cluster is composed of something like
1 - content (alias)
-- about 5 indexes, each has 4 shards 1 replica
2 - source (alias)
-- about 12 indexes, each has 4 shards 1 replica
3 - lookup (alias)
-- about 12 indexes, each has 4 shards 1 replica

and so on. we have about 5 "aliases" setup that map to different index
groups.
This ultimately gives us about around 120 shards across 4 servers.

Now, how we had been looking at it was just by a single "alias" with it's
indexes and shard/replica setup.
But, had not looked at it holistically as an entire cluster.

So, maybe this is too many shards for 4 servers? roughly 8gigs of ram 4 cpus

Only 2 "alias" groupings do a lot of work. our content and lookup groupings.
Maybe we need to break those up into their own cluster with their own
servers?

Or, should having 120 or so shards on 4 servers not really be an issue?

Thanks,
Scott


(Shay Banon) #2

You might end up searching on more shards then a 4 node cluster can
sustain. How many documents do you have? Do you really need 12 indexes for
the lookup alias for example?

On Fri, Jun 8, 2012 at 8:19 PM, Scott Decker scott@publishthis.com wrote:

Hey all,
Still trying to track down our performance issues and had another
question.

we have 1 cluster, but within it we have multiple indexes for different
products.
So, we have multiple content indexes, multiple topic search indexes,
multiple other indexes, etc.

So, a cluster is composed of something like
1 - content (alias)
-- about 5 indexes, each has 4 shards 1 replica
2 - source (alias)
-- about 12 indexes, each has 4 shards 1 replica
3 - lookup (alias)
-- about 12 indexes, each has 4 shards 1 replica

and so on. we have about 5 "aliases" setup that map to different index
groups.
This ultimately gives us about around 120 shards across 4 servers.

Now, how we had been looking at it was just by a single "alias" with it's
indexes and shard/replica setup.
But, had not looked at it holistically as an entire cluster.

So, maybe this is too many shards for 4 servers? roughly 8gigs of ram 4
cpus

Only 2 "alias" groupings do a lot of work. our content and lookup
groupings.
Maybe we need to break those up into their own cluster with their own
servers?

Or, should having 120 or so shards on 4 servers not really be an issue?

Thanks,
Scott


(Scott Decker) #3

On the large number of indexes, we do that because each one is a certain
subset of our data. If we update that subset in our database, then we just
need to create a new index for that subset, and then remove the old one
from an alias, and add the new one to the alias.
Its mainly to avoid messing up indexes that production is all ready using.

To do it all in 1 index, means that we have to recreate all data in the
index. there are around 2-3 million records in it all, and that takes quite
a bit to update all in one big update.
Once the indexes are created though, we maybe only update 2 or 3 of them
each day and swap them out of the aliases. We optimize them into 1 segment
for faster searching.
Maybe these just need to be 1 shard 1 replica, instead of 4 shard 1
replica?

all of our queries go against aliases, and most of our queries go against
our lookup alias and our content alias. I was just wondering if maybe
having so many shards on the nodes may impact something, even though we
aren't querying all of those shards, just specific aliases, that then do
queries against specific indexes.

On Friday, June 8, 2012 11:19:15 AM UTC-7, Scott Decker wrote:

Hey all,
Still trying to track down our performance issues and had another
question.

we have 1 cluster, but within it we have multiple indexes for different
products.
So, we have multiple content indexes, multiple topic search indexes,
multiple other indexes, etc.

So, a cluster is composed of something like
1 - content (alias)
-- about 5 indexes, each has 4 shards 1 replica
2 - source (alias)
-- about 12 indexes, each has 4 shards 1 replica
3 - lookup (alias)
-- about 12 indexes, each has 4 shards 1 replica

and so on. we have about 5 "aliases" setup that map to different index
groups.
This ultimately gives us about around 120 shards across 4 servers.

Now, how we had been looking at it was just by a single "alias" with it's
indexes and shard/replica setup.
But, had not looked at it holistically as an entire cluster.

So, maybe this is too many shards for 4 servers? roughly 8gigs of ram 4
cpus

Only 2 "alias" groupings do a lot of work. our content and lookup
groupings.
Maybe we need to break those up into their own cluster with their own
servers?

Or, should having 120 or so shards on 4 servers not really be an issue?

Thanks,
Scott


(Shay Banon) #4

I see, I think I got the picture :). Then maybe using a smaller number of
shards will make more sense.

On Tue, Jun 12, 2012 at 5:19 PM, Scott Decker scott@publishthis.com wrote:

On the large number of indexes, we do that because each one is a certain
subset of our data. If we update that subset in our database, then we just
need to create a new index for that subset, and then remove the old one
from an alias, and add the new one to the alias.
Its mainly to avoid messing up indexes that production is all ready using.

To do it all in 1 index, means that we have to recreate all data in the
index. there are around 2-3 million records in it all, and that takes quite
a bit to update all in one big update.
Once the indexes are created though, we maybe only update 2 or 3 of them
each day and swap them out of the aliases. We optimize them into 1 segment
for faster searching.
Maybe these just need to be 1 shard 1 replica, instead of 4 shard 1
replica?

all of our queries go against aliases, and most of our queries go against
our lookup alias and our content alias. I was just wondering if maybe
having so many shards on the nodes may impact something, even though we
aren't querying all of those shards, just specific aliases, that then do
queries against specific indexes.

On Friday, June 8, 2012 11:19:15 AM UTC-7, Scott Decker wrote:

Hey all,
Still trying to track down our performance issues and had another
question.

we have 1 cluster, but within it we have multiple indexes for different
products.
So, we have multiple content indexes, multiple topic search indexes,
multiple other indexes, etc.

So, a cluster is composed of something like
1 - content (alias)
-- about 5 indexes, each has 4 shards 1 replica
2 - source (alias)
-- about 12 indexes, each has 4 shards 1 replica
3 - lookup (alias)
-- about 12 indexes, each has 4 shards 1 replica

and so on. we have about 5 "aliases" setup that map to different index
groups.
This ultimately gives us about around 120 shards across 4 servers.

Now, how we had been looking at it was just by a single "alias" with it's
indexes and shard/replica setup.
But, had not looked at it holistically as an entire cluster.

So, maybe this is too many shards for 4 servers? roughly 8gigs of ram 4
cpus

Only 2 "alias" groupings do a lot of work. our content and lookup
groupings.
Maybe we need to break those up into their own cluster with their own
servers?

Or, should having 120 or so shards on 4 servers not really be an issue?

Thanks,
Scott


(system) #5