ES and SAN storage

Hello,

I'm still testing ES at a very small scale (1 node on a multipurpose server), but I would like to extend it's use at work as a backend for logstash. It means that the LS+ES cluster would have to eat few GB of data every day, up to 15 or 20GB later if things go well.
I'm doing all this as a side project: no investment apart from work hours. I will recycle blades and storage we plan to decommission from our virtualization farm.
So I'm likely to end with 2 or 3 dual-xeon blades, but no real internal storage (an SD-card), and a LUN on a SAN.

How does ES behave is shared storage condition? What are the best practices about nodes/shards/replicas/...?
Intended audience is Operation team, so less than 10 persons. So no big search concurrency but probably mostly "deep" search and ill-designed queries :slight_smile:

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0EF076AD-2908-4860-A97F-060A5C511AC3%40patpro.net.
For more options, visit https://groups.google.com/d/optout.

I think anyone will find it difficult to answer such questions just because
there are several factors that derive the decision like latency
requirements, high availability requirements, how shared SAN storage is and
impact of somebody stealing IO under the hood etc. The best way is to
develop a test model and test it out. Look at cluster settings on how to
disable/enable shard allocation.

On Wed, Apr 30, 2014 at 8:47 AM, Patrick Proniewski <
elasticsearch@patpro.net> wrote:

Hello,

I'm still testing ES at a very small scale (1 node on a multipurpose
server), but I would like to extend it's use at work as a backend for
logstash. It means that the LS+ES cluster would have to eat few GB of data
every day, up to 15 or 20GB later if things go well.
I'm doing all this as a side project: no investment apart from work hours.
I will recycle blades and storage we plan to decommission from our
virtualization farm.
So I'm likely to end with 2 or 3 dual-xeon blades, but no real internal
storage (an SD-card), and a LUN on a SAN.

How does ES behave is shared storage condition? What are the best
practices about nodes/shards/replicas/...?
Intended audience is Operation team, so less than 10 persons. So no big
search concurrency but probably mostly "deep" search and ill-designed
queries :slight_smile:

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0EF076AD-2908-4860-A97F-060A5C511AC3%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWrdPOcspORJT_AR%3DXUNQ5H0xfVcEpL%2B6aZ-sPb9X_Lsgw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Well, then maybe my questions were not precise enough.
My first goal was to make sure ES does work sharing a unique storage for all nodes.
My second gaol was to learn if each node requires to have its dedicated file tree, or if you can put every files together as if there's only one ES node.
Does-it make sense to have replicas when eventually filesystem IOs are shared?
Does moving a shard from a node to another makes data passing through the CPU, or is ES smart enough to just pass the pointer to the file?

On 30 avr. 2014, at 18:33, Mohit Anchlia wrote:

I think anyone will find it difficult to answer such questions just because there are several factors that derive the decision like latency requirements, high availability requirements, how shared SAN storage is and impact of somebody stealing IO under the hood etc. The best way is to develop a test model and test it out. Look at cluster settings on how to disable/enable shard allocation.

On Wed, Apr 30, 2014 at 8:47 AM, Patrick Proniewski elasticsearch@patpro.net wrote:
Hello,

I'm still testing ES at a very small scale (1 node on a multipurpose server), but I would like to extend it's use at work as a backend for logstash. It means that the LS+ES cluster would have to eat few GB of data every day, up to 15 or 20GB later if things go well.
I'm doing all this as a side project: no investment apart from work hours. I will recycle blades and storage we plan to decommission from our virtualization farm.
So I'm likely to end with 2 or 3 dual-xeon blades, but no real internal storage (an SD-card), and a LUN on a SAN.

How does ES behave is shared storage condition? What are the best practices about nodes/shards/replicas/...?
Intended audience is Operation team, so less than 10 persons. So no big search concurrency but probably mostly "deep" search and ill-designed queries :slight_smile:

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F6DDE665-B311-4964-A0BF-FFEF156E4FA3%40patpro.net.
For more options, visit https://groups.google.com/d/optout.

I'll try and answer as much I know:

ES shouldn't have any issues working with SAN, NFS or EBS. Yes each node
need its own unique file path, they don't share files from other nodes.
Replicas in this only make sense if you are solving for a VM or a node
failure per se. Or it also makes sense if you have SAN storage coming from
a different array.

I don't follow your last question.

On Wed, Apr 30, 2014 at 10:04 AM, Patrick Proniewski <
elasticsearch@patpro.net> wrote:

Well, then maybe my questions were not precise enough.
My first goal was to make sure ES does work sharing a unique storage for
all nodes.
My second gaol was to learn if each node requires to have its dedicated
file tree, or if you can put every files together as if there's only one ES
node.
Does-it make sense to have replicas when eventually filesystem IOs are
shared?
Does moving a shard from a node to another makes data passing through the
CPU, or is ES smart enough to just pass the pointer to the file?

On 30 avr. 2014, at 18:33, Mohit Anchlia wrote:

I think anyone will find it difficult to answer such questions just
because there are several factors that derive the decision like latency
requirements, high availability requirements, how shared SAN storage is and
impact of somebody stealing IO under the hood etc. The best way is to
develop a test model and test it out. Look at cluster settings on how to
disable/enable shard allocation.

On Wed, Apr 30, 2014 at 8:47 AM, Patrick Proniewski <
elasticsearch@patpro.net> wrote:
Hello,

I'm still testing ES at a very small scale (1 node on a multipurpose
server), but I would like to extend it's use at work as a backend for
logstash. It means that the LS+ES cluster would have to eat few GB of data
every day, up to 15 or 20GB later if things go well.
I'm doing all this as a side project: no investment apart from work
hours. I will recycle blades and storage we plan to decommission from our
virtualization farm.
So I'm likely to end with 2 or 3 dual-xeon blades, but no real internal
storage (an SD-card), and a LUN on a SAN.

How does ES behave is shared storage condition? What are the best
practices about nodes/shards/replicas/...?
Intended audience is Operation team, so less than 10 persons. So no big
search concurrency but probably mostly "deep" search and ill-designed
queries :slight_smile:

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/F6DDE665-B311-4964-A0BF-FFEF156E4FA3%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWrqqNrh7jbW3%2BvO%2BSpXdxRGTvB3zcCod6yPRMgt42kcUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

On 30 avr. 2014, at 19:34, Mohit Anchlia wrote:

I'll try and answer as much I know:

ES shouldn't have any issues working with SAN, NFS or EBS. Yes each node need its own unique file path, they don't share files from other nodes.

ok.

Replicas in this only make sense if you are solving for a VM or a node failure per se. Or it also makes sense if you have SAN storage coming from a different array.

ok.

I don't follow your last question.

My english is limited, sorry. As far as I understand ES, some shard balancing occurs in the background, when some are created or deleted, others will move from node to node so the number of shards is even between nodes. When storage is isolated for each node, moving a shard to another node requires the file to go through the node CPU/RAM, then network, then CPU/RAM of remote node, then storage. It would be very nice in a shared-storage scenario that the shard would not be moved through fs-cpu-ram-network-cpu-ram-fs but through a simple rename-and-tell action.
Does it make sense?

On Wed, Apr 30, 2014 at 10:04 AM, Patrick Proniewski elasticsearch@patpro.net wrote:
Well, then maybe my questions were not precise enough.
My first goal was to make sure ES does work sharing a unique storage for all nodes.
My second gaol was to learn if each node requires to have its dedicated file tree, or if you can put every files together as if there's only one ES node.
Does-it make sense to have replicas when eventually filesystem IOs are shared?
Does moving a shard from a node to another makes data passing through the CPU, or is ES smart enough to just pass the pointer to the file?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2D4B8E1F-3513-465F-B864-65401D9E38E1%40patpro.net.
For more options, visit https://groups.google.com/d/optout.

It makes sense if it was just as simple :slight_smile: The reason shards need to move
through the higher level of stack is that every node maintains it's own
indexes or lucene segments and it can't just be switched. And I think that
is primarily because of how internal structures are maintained in lucene.
You might be able to develop a workaround using one or more of these
settings:

On Wed, Apr 30, 2014 at 1:05 PM, Patrick Proniewski <
elasticsearch@patpro.net> wrote:

On 30 avr. 2014, at 19:34, Mohit Anchlia wrote:

I'll try and answer as much I know:

ES shouldn't have any issues working with SAN, NFS or EBS. Yes each node
need its own unique file path, they don't share files from other nodes.

ok.

Replicas in this only make sense if you are solving for a VM or a node
failure per se. Or it also makes sense if you have SAN storage coming from
a different array.

ok.

I don't follow your last question.

My english is limited, sorry. As far as I understand ES, some shard
balancing occurs in the background, when some are created or deleted,
others will move from node to node so the number of shards is even between
nodes. When storage is isolated for each node, moving a shard to another
node requires the file to go through the node CPU/RAM, then network, then
CPU/RAM of remote node, then storage. It would be very nice in a
shared-storage scenario that the shard would not be moved through
fs-cpu-ram-network-cpu-ram-fs but through a simple rename-and-tell action.
Does it make sense?

On Wed, Apr 30, 2014 at 10:04 AM, Patrick Proniewski <
elasticsearch@patpro.net> wrote:
Well, then maybe my questions were not precise enough.
My first goal was to make sure ES does work sharing a unique storage for
all nodes.
My second gaol was to learn if each node requires to have its dedicated
file tree, or if you can put every files together as if there's only one ES
node.
Does-it make sense to have replicas when eventually filesystem IOs are
shared?
Does moving a shard from a node to another makes data passing through
the CPU, or is ES smart enough to just pass the pointer to the file?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2D4B8E1F-3513-465F-B864-65401D9E38E1%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqDyjcfPKxvY37b%2B%2BTwnDk7xj9A%2BL0k19wiLG58XNGPZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.