We have started noticing in our environment that our query performance is
starting to suffer for some of our datasets that span the roughly 1 year of
data we keep online. We are looking into optimizations we can make to our
Index / Shard configuration and I was wondering if there is a preferable
way to configure our indexes / shards? Right now we create a new index for
each week and have 22 shards per index (We have 22 data nodes). Would it
be more optimal to reduce the number of indexes (index by month) and have
larger shards? Our documents are kb in size so they are not all that big,
we just have a lot of them.
The feedback we typically get back from support is just test and see. That
is something we can do, but there is a fair amount of effort / time that we
would need to put in to only find that it doesn't give us any benefit. I
was just hoping some of the more experienced folks could provide some input
on possible solutions. If all else fails, we can always try to test
different configs.
We have started noticing in our environment that our query performance is
starting to suffer for some of our datasets that span the roughly 1 year of
data we keep online. We are looking into optimizations we can make to our
Index / Shard configuration and I was wondering if there is a preferable
way to configure our indexes / shards? Right now we create a new index for
each week and have 22 shards per index (We have 22 data nodes). Would it
be more optimal to reduce the number of indexes (index by month) and have
larger shards? Our documents are kb in size so they are not all that big,
we just have a lot of them.
The feedback we typically get back from support is just test and see.
That is something we can do, but there is a fair amount of effort / time
that we would need to put in to only find that it doesn't give us any
benefit. I was just hoping some of the more experienced folks could
provide some input on possible solutions. If all else fails, we can always
try to test different configs.
Thanks for the reply! We have roughly 13 TB of data and about 40 indexes (1
index per week). For each index we have 22 shards (one for every data
node).
On Monday, January 5, 2015 2:27:24 PM UTC-8, Mark Walkom wrote:
One shard per node is ideal as you spread the load.
Reducing the shard count can help but it depends on a few things.
How much data do you have in your cluster, how many indexes?
We have started noticing in our environment that our query performance is
starting to suffer for some of our datasets that span the roughly 1 year of
data we keep online. We are looking into optimizations we can make to our
Index / Shard configuration and I was wondering if there is a preferable
way to configure our indexes / shards? Right now we create a new index for
each week and have 22 shards per index (We have 22 data nodes). Would it
be more optimal to reduce the number of indexes (index by month) and have
larger shards? Our documents are kb in size so they are not all that big,
we just have a lot of them.
The feedback we typically get back from support is just test and see.
That is something we can do, but there is a fair amount of effort / time
that we would need to put in to only find that it doesn't give us any
benefit. I was just hoping some of the more experienced folks could
provide some input on possible solutions. If all else fails, we can always
try to test different configs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.