Logstash-Elasticsearch looking for advise on cluster and shard/index routing


(Fernando Emwferm) #1

Hej to y'all:

I am looking for some guidance regarding elasticsearch together with
logstash. I am new to all of these and I want to find if the setup we want
is possible.

We are setting up a couple of nodes in AWS EC2 which are located in
different availability zones within AWS. I have resolved the issue of
setting up a Cluster inside AWS without the plugin just using security
groups and iptables. But the setup that we want requires the cluster mainly
for searching. A bigger issue for us is about cost since AWS charges for
the data transfering between availability zones, and that, we want to avoid.

So back to what our scenario looks like, we would like to have the
following:

We want to have two separate servers that have all their indexes and shards
located locally (meaning that when indexing information via logstash we
need that all new and existing shards are not spread across the two nodes
in the cluster). Node A will receive log information from availability zone
1 and needs to keep that info only in that node. And Node B will receive
log information from availability zone 2 with the same behavior. But (and
here is where I have an issue with) we would like to be able to go into one
server (lets say Node A) and be able to query information from both Node A
and Node B. Our setup is not critical and that is why we can skip the nice
clustering and distributed functionality of elasticsearch.

I have tried (after a lot of reading and "googling") with the parameters
cluster.routing.allocation.awareness.attributes,
cluster.routing.allocation.awareness.force.zone.values, node.zone,
index.routing.allocation.total_shards_per_node,
index.routing.allocation.require.zone and
index.routing.allocation.exclude.zone. I have set these in the
elasticsearch.yml file starting the elasticsearch cluster and I also put
these values in elasticsearch.yml file for logstash. But with every try (I
delete the data directory everytime to test) when I index some test logs
from Node B some shards are written in Node A anyway.

Do you think this setup is at all possible??? or maybe the elasticsearch
distributed behavior cannot be changed like this and that is ok as long as
I know then I can move on with another setup since, as I said, is nothing
critical but I cannot take much more time investigating this (already been
at it for a couple of weeks).

Thank you to anyone for their time, attention and help, best regards,
s.r./Fernando
P.S.: I am using logstash 1.3.3 and elasticsearch 0.90.9 with Kibana 3
Milestone 4.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a770c8dd-6498-4722-9e6e-4fd034523434%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Tony Su) #2

Hi, I just read online an article which although doesn't describe your
scenario exactly, might be helpful
http://blog.qbox.io/launching-and-scaling-elasticsearch

My guess is that you're simply setting up two boxes with the following
configuration
index.number_of_shards: 1
index.number_of_replicas: 0

I don't know how to control data staying on the Server you input data,
maybe someone can comment here. With multiple shards in my testing, I have
not found any way to control where the shard is allocated. Maybe disable
load balancing?

I would expect that any query you run against your cluster will be executed
on both nodes.

Tony

On Monday, February 17, 2014 7:32:35 AM UTC-8, Fernando Emwferm wrote:

Hej to y'all:

I am looking for some guidance regarding elasticsearch together with
logstash. I am new to all of these and I want to find if the setup we want
is possible.

We are setting up a couple of nodes in AWS EC2 which are located in
different availability zones within AWS. I have resolved the issue of
setting up a Cluster inside AWS without the plugin just using security
groups and iptables. But the setup that we want requires the cluster mainly
for searching. A bigger issue for us is about cost since AWS charges for
the data transfering between availability zones, and that, we want to avoid.

So back to what our scenario looks like, we would like to have the
following:

We want to have two separate servers that have all their indexes and
shards located locally (meaning that when indexing information via logstash
we need that all new and existing shards are not spread across the two
nodes in the cluster). Node A will receive log information from
availability zone 1 and needs to keep that info only in that node. And Node
B will receive log information from availability zone 2 with the same
behavior. But (and here is where I have an issue with) we would like to be
able to go into one server (lets say Node A) and be able to query
information from both Node A and Node B. Our setup is not critical and that
is why we can skip the nice clustering and distributed functionality of
elasticsearch.

I have tried (after a lot of reading and "googling") with the parameters
cluster.routing.allocation.awareness.attributes,
cluster.routing.allocation.awareness.force.zone.values, node.zone,
index.routing.allocation.total_shards_per_node,
index.routing.allocation.require.zone and
index.routing.allocation.exclude.zone. I have set these in the
elasticsearch.yml file starting the elasticsearch cluster and I also put
these values in elasticsearch.yml file for logstash. But with every try (I
delete the data directory everytime to test) when I index some test logs
from Node B some shards are written in Node A anyway.

Do you think this setup is at all possible??? or maybe the elasticsearch
distributed behavior cannot be changed like this and that is ok as long as
I know then I can move on with another setup since, as I said, is nothing
critical but I cannot take much more time investigating this (already been
at it for a couple of weeks).

Thank you to anyone for their time, attention and help, best regards,
s.r./Fernando
P.S.: I am using logstash 1.3.3 and elasticsearch 0.90.9 with Kibana 3
Milestone 4.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b2d9daab-8af6-4f2e-9da6-af93891798ab%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #3

What you probably really want is to eventually migrate to 1.0 and then
build 2 separate clusters (one in each AZ) and then index into them
separately, and then use the tribe node
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-tribe.html)
to federate your search results across clusters.

In the meantime assuming you really want a single cluster, there's no
reason why you can't use index allocation awareness to do what you want. So
for example:

ES.yml on zone 1 would have:

node.zone: zone1

ES.yml on zone 2 would have:

node.zone: zone2

Then when you create an index that you want only to go to zone 1, you will
say something like (maybe just use an index template for convenience):

PUT http://localhost:9200/
{
"settings": { "index.routing.allocation.include.zone": "zone1" }
}

And when you create an index that you want only to go to zone 2, you will
say something like (maybe just use an index template for convenience):

PUT http://localhost:9200/
{
"settings": { "index.routing.allocation.include.zone": "zone2" }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b8444c3-dfa3-464c-b974-ada0b09be62b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4