We have a cluster set up already for indexing customer documents (bigger
documents, medium traffic). We want to add event logging (small documents,
lots of traffic). Should we do this on a new cluster?
When do you typically create a separate cluster vs. add indexes to a
cluster?
Is it straightforward to move indices to a new cluster at a later date?
Reasons I can think of for using the same cluster:
Fewer servers to administer
ES is already well equipped to handle multiple indices with various
mappings
Simple searching across event & document indices, if needed
If we were planning to add new servers for an events cluster, why not
have them in the same cluster for more redundancy
Reasons I can think of for a new, separate, cluster:
The cluster can be tuned to the relevant performance needs
Don't have to worry about the flood of events using up all the disk
space
Separation of responsibility / "single responsibility" design pattern
If one cluster goes down / hits performance problems, the other can
continue along fine
I couldn't see any blog posts or documentation around best practice for
this, so your insight would be most welcome!
There's no real best practice here at this stage.
But if you think about traditional datastores (ie DBs), would you mix these
data sets?
On 22 March 2015 at 13:34, Nick Malcolm nick@revert.io wrote:
Hi,
We have a cluster set up already for indexing customer documents (bigger
documents, medium traffic). We want to add event logging (small documents,
lots of traffic). Should we do this on a new cluster?
When do you typically create a separate cluster vs. add indexes to a
cluster?
Is it straightforward to move indices to a new cluster at a later
date?
Reasons I can think of for using the same cluster:
Fewer servers to administer
ES is already well equipped to handle multiple indices with various
mappings
Simple searching across event & document indices, if needed
If we were planning to add new servers for an events cluster, why
not have them in the same cluster for more redundancy
Reasons I can think of for a new, separate, cluster:
The cluster can be tuned to the relevant performance needs
Don't have to worry about the flood of events using up all the
disk space
Separation of responsibility / "single responsibility" design pattern
If one cluster goes down / hits performance problems, the other can
continue along fine
I couldn't see any blog posts or documentation around best practice for
this, so your insight would be most welcome!
Good question - that gave me something to Google. Seems the issue is pretty
subjective.
I guess the question becomes "Is an index equivalent to a traditional
database, or is a cluster equivalent"? In my mind index = traditional
database. Types = tables, etc. In which case we would add event indices to
the same cluster.
Maybe we'll do it that way, and if it hurts later we'll look at putting it
on its own cluster, or other performance improvements.
On Monday, March 23, 2015 at 11:26:18 AM UTC+13, Mark Walkom wrote:
There's no real best practice here at this stage.
But if you think about traditional datastores (ie DBs), would you mix
these data sets?
On 22 March 2015 at 13:34, Nick Malcolm <ni...@revert.io <javascript:>>
wrote:
Hi,
We have a cluster set up already for indexing customer documents (bigger
documents, medium traffic). We want to add event logging (small documents,
lots of traffic). Should we do this on a new cluster?
When do you typically create a separate cluster vs. add indexes to
a cluster?
Is it straightforward to move indices to a new cluster at a later
date?
Reasons I can think of for using the same cluster:
Fewer servers to administer
ES is already well equipped to handle multiple indices with various
mappings
Simple searching across event & document indices, if needed
If we were planning to add new servers for an events cluster, why
not have them in the same cluster for more redundancy
Reasons I can think of for a new, separate, cluster:
The cluster can be tuned to the relevant performance needs
Don't have to worry about the flood of events using up all the
disk space
Separation of responsibility / "single responsibility" design
pattern
If one cluster goes down / hits performance problems, the other can
continue along fine
I couldn't see any blog posts or documentation around best practice for
this, so your insight would be most welcome!
A lot of these questions are answered based around your use case. If you
have low volume for one new dataset, it totally makes sense to leverage
existing infrastructure and then make future decisions as you go.
On 23 March 2015 at 10:35, Nick Malcolm nick@revert.io wrote:
Hi Mark!
Good question - that gave me something to Google. Seems the issue is
pretty subjective.
I guess the question becomes "Is an index equivalent to a traditional
database, or is a cluster equivalent"? In my mind index = traditional
database. Types = tables, etc. In which case we would add event indices to
the same cluster.
Maybe we'll do it that way, and if it hurts later we'll look at putting it
on its own cluster, or other performance improvements.
On Monday, March 23, 2015 at 11:26:18 AM UTC+13, Mark Walkom wrote:
There's no real best practice here at this stage.
But if you think about traditional datastores (ie DBs), would you mix
these data sets?
On 22 March 2015 at 13:34, Nick Malcolm ni...@revert.io wrote:
Hi,
We have a cluster set up already for indexing customer documents (bigger
documents, medium traffic). We want to add event logging (small documents,
lots of traffic). Should we do this on a new cluster?
When do you typically create a separate cluster vs. add indexes to
a cluster?
Is it straightforward to move indices to a new cluster at a later
date?
Reasons I can think of for using the same cluster:
Fewer servers to administer
ES is already well equipped to handle multiple indices with
various mappings
Simple searching across event & document indices, if needed
If we were planning to add new servers for an events cluster, why
not have them in the same cluster for more redundancy
Reasons I can think of for a new, separate, cluster:
The cluster can be tuned to the relevant performance needs
Don't have to worry about the flood of events using up all the
disk space
Separation of responsibility / "single responsibility" design
pattern
If one cluster goes down / hits performance problems, the other
can continue along fine
I couldn't see any blog posts or documentation around best practice for
this, so your insight would be most welcome!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.