We have a very large document archive (multipetabyte) and are considering
leveraging elastic search to provide more elastic indexing/search
capabilities. We archive large amounts of data every day, so I was
considering proposing an architecture where we create a new index for every
day, then use aliases to combine searches across any indexes that we may
I have two questions:
I have read some comments that recommend not using a single ES cluster
for petabyte levels of data; that it is better to create separate clusters
at this scale (e.g. a separate cluster for each month). If that is the
case, are there capabilities for doing cross cluster search/aggregation of
results, or would that be implemented by the application?
I have read mixed information about the split brain issue. Because our
archive is so large, we cannot afford to reindex large portions of it, so
the split brain issue is a significant concern. On the one hand, I have
read that with proper configuration, split brains is not a problem. I have
also read that even with proper configuration it is still possible to have
split brains. So let me pose the question this way: Suppose you would be
fired if you ever had to reindex more than 5 nodes in your cluster at
once...would you still use ElasticSearch given the split brain issue
(assume perfect configuration, i.e. that the splitting was not caused by a
configuration error, but that are network disruptions between the nodes is
possible)? I am just trying to gauge how serious a problem this is for ES.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
For more options, visit https://groups.google.com/groups/opt_out.