inline
On Fri, Dec 13, 2013 at 12:20 AM, Nikolas Everett nik9000@gmail.com wrote:
On Thu, Dec 12, 2013 at 4:26 PM, Itamar Syn-Hershko itamar@code972.comwrote:
Why are you sharding on the first place? unless you know it to grow very
big for sure, I'd skip this all together. Planning in advance to avoid
possible resharding later can easily become over-engineering.
I'll take that as a yes. I'm not sure if I'd consider accepting the
defaults to be over-engineering though.
I've heard Simon saying more than once that some of those defaults are
actually pretty bad
Depending on usage, I'd go with having a shard up to dozen or 2 of GBs.
That's an ideal size from my experience.
That seems useful. Do you base that recommendation around ease of moving
the shards around?
Mostly around what index size makes sense in the Lucene level, but
obviously that depends on your analyzers etc. Moving shards around starts
getting too costly to do at around 15GB. We've had a cluster that took days
to stabilize when it got unstable because of larger indexes.
By that logic the vast majority of the indexes I have now should be
squished to one shard. None of them grow that quickly and re-sharding
them later really isn't a problem.
If you want to go the virtual shard route nevertheless, try using a
shard-key that will leave you with empty shards now and that you can bias
later (for example - by using dates).
I assume you mean custom routing. I can't really use that as there aren't
any good route keys. None of the candidates break data into small enough
buckets.
In the case that you mean indexing documents based on time or some other
key ala logstash, that kind of works for me and I already do it. I do it
because it improves the output of the suggesters because each index has
very different term frequencies. I don't get to delete any of the indexes
over time, though, because they aren't time based.
I meant routing. Unless you really handle lots of data, time sliced
indexing is an alternative to sharding in most cases. Combining both
(instead of using aliases, for example) only makes sense if you have to use
some predefined slice size (like daily) and you have tons of data coming in.
Nik
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3CCcc4UUUe%2Bog-EMqh6DRi_DD%2BDLDiGg2udNDX2BTz1w%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZujjpOjSxwb%2B9miwKw6Sic-OuHMMW4sXvrG80L24UK59w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.