I have heard that ideally, you want to have a similar number of documents
per shard for optimal search times, is that correct?
I have data volumes that are just all over the place, from 100k to tens of
millions in a week.
I'm thinking about a river plugin that could:
Take a mapping object as a template
Define a template for child index names (project_\YYYY_\MM_\DD_\NNN =
project_2014_04_08_000, etc)
Define index shard count (5)
Define maximum index size (1,000,000)
Define a listening endpoint of some sort
Documents would stream into the listening endpoint however you wanted,
rivers, bulk loads using an API, etc. They would be automatically routed to
the lowest numbered not-full index. So on a given day you could end up with
fifteen indexes, or eighty, or two, but they'd all be a maximum of N
records.
A plugin seems desirable in this case, as it frees you from needing to
write the load balancing into every ingestion stream you've got.
Is this a reasonable solution to this problem? Am I overcomplicating
things?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/176f4fb2-d924-4ec2-bcee-67ad8de24dfb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.