When is one shard enough

inline

On Fri, Dec 13, 2013 at 7:26 PM, Nikolas Everett nik9000@gmail.com wrote:

Mostly around what index size makes sense in the Lucene level, but
obviously that depends on your analyzers etc. Moving shards around starts
getting too costly to do at around 15GB. We've had a cluster that took days
to stabilize when it got unstable because of larger indexes.

Neat! Well, most wikis are getting squished to one shard then. What kind
of thing about my choice of analyzer would suggest needing larger or
smaller shards?

We use custom analyzers with stemming which double the amount of terms
(multiple terms on the same position), and some even more than that. Think
SynonymFilter and similar. Or if you index the same data multiple times.
This bloats the index.

Do you have a sense as to how many shards is too many? I assume search
and indexing would speed up until you get to more shards than you have
nodes. After that increasing shards would slow down search and indexing
but keep the shards to a more manageable size. I remember reading about
people complaining about performance when their index spanned thousands of
shards and they wanted to search them all. I won't get that big, but could
imagine fifty or a hundred.

Not really, and I believe this is tightly coupled with your data and
expected set of queries. The ultimate idea behind sharding is to be able to
make it so a query can execute on some of the shards to get full results,
because the other shards return 0 results. The sharding key / function
would have to try and do that.

FWIW I have indexed the English wikipedia on my MacBook Air and it takes
about 28GB. I can probably reduced more since I didn't use stemming nor
ASCIIFoldingFilter. Its perfectly fine to have this as one shard assuming
the growth rate is moderate. Just to be on the safe side you can shard it
to 3 shards. To the best of my knowledge thats the largest Wiki there is,
so thats your worst case scenario.

Thanks!

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0Sd3cx2pZ%2B%2B%3DwqASRc_E%2BxmFZL3kqX0Zrb7HnRG06VVA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zte4Yn7SOjvOpSw2aZ3RHJjdiZnbysqqXO%2BVL0sJwUp5g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.