Azure Cloud Plugin Problems


(Andrew Westgarth) #1

Hi,
I read with interest the news about the Azure Cloud Plugin over the
weekend and today have been trying to get it working with Windows VMs on
Azure with mixed levels of success.

I have two environments/clusters one which has been running for a few weeks
and another which is brand new and only been running for a couple of days;
both have the head plugin installed so I can see the status of the
cluster(s).

All of the clusters consist of 3 machines are using the Windows Server 2012
R2 Datacenter base image with java 7 added, and elasticsearch 0.90.10
installed as a service set to automatic startup.

Cluster 1 - been running using multicast discovery disabled and the ip
addresses of the nodes listed. I have since installed the azure cloud
plugin, added the certificate and configuration to the node and enabled
multicast discovery again and commented out the list of ip addresses. Now
when I view the details of the cluster, none of the nodes can see each
other and the cluster health status is marked in amber as the full cluster
is no longer available.

the elasticsearch.yml file is as follows:

##################### ElasticSearch Configuration Example
#####################

This file contains an overview of various configuration settings,

targeted at operations staff. Application developers should

consult the guide at http://elasticsearch.org/guide.

The installation procedure is covered at

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html

.

ElasticSearch comes with reasonable defaults for most settings,

so you can try it out without bothering with configuration.

Most of the time, these defaults are just fine for running a production

cluster. If you're fine-tuning your cluster, or wondering about the

effect of certain configuration option, please do ask on the

mailing list or IRC channel [http://elasticsearch.org/community].

Any element in the configuration can be replaced with environment

variables

by placing them in ${...} notation. For example:

node.rack: ${RACK_ENV_VAR}

For information on supported formats and syntax for the config file, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html

################################### Cluster
###################################

Cluster name identifies your cluster for auto-discovery. If you're running

multiple clusters on the same network, make sure you're using unique

names.

cluster.name: elasticsearch

#################################### Node
#####################################

Node names are generated dynamically on startup, so you're relieved

from configuring them manually. You can tie this node to a specific name:

node.name: "Franz Kafka"

Every node can be configured to allow or deny being eligible as the

master,

and to allow or deny to store the data.

Allow this node to be eligible as a master node (enabled by default):

node.master: true

Allow this node to store data (enabled by default):

node.data: true

You can exploit these settings to design advanced cluster topologies.

1. You want this node to never become a master node, only to hold data.

This will be the "workhorse" of your cluster.

node.master: false

node.data: true

2. You want this node to only serve as a master: to not store any data and

to have free resources. This will be the "coordinator" of your cluster.

node.master: true

node.data: false

3. You want this node to be neither master nor data node, but

to act as a "search load balancer" (fetching data from nodes,

aggregating results, etc.)

node.master: false

node.data: false

Use the Cluster Health API [http://localhost:9200/_cluster/health], the

Node Info API [http://localhost:9200/_cluster/nodes] or GUI tools

such as http://github.com/lukas-vlcek/bigdesk and

http://mobz.github.com/elasticsearch-head to inspect the cluster state.

A node can have generic attributes associated with it, which can later be

used

for customized shard allocation filtering, or allocation awareness. An

attribute

is a simple key value pair, similar to node.key: value, here is an

example:

node.rack: rack314

By default, multiple nodes are allowed to start from the same

installation location

to disable it, set the following:

node.max_local_storage_nodes: 1

#################################### Index
####################################

You can set a number of options (such as shard/replica options, mapping

or analyzer definitions, translog settings, ...) for indices globally,

in this file.

Note, that it makes more sense to configure index settings specifically

for

a certain index, either when creating it or by using the index templates

API.

See <

http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html>
and

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html

for more information.

Set the number of shards (splits) of an index (5 by default):

index.number_of_shards: 5

Set the number of replicas (additional copies) of an index (1 by default):

index.number_of_replicas: 1

Note, that for development on a local machine, with small indices, it

usually

makes sense to "disable" the distributed features:

index.number_of_shards: 1

index.number_of_replicas: 0

These settings directly affect the performance of index and search

operations

in your cluster. Assuming you have enough machines to hold shards and

replicas, the rule of thumb is:

1. Having more shards enhances the indexing performance and allows to

distribute a big index across machines.

2. Having more replicas enhances the search performance and improves

the

cluster availability.

The "number_of_shards" is a one-time setting for an index.

The "number_of_replicas" can be increased or decreased anytime,

by using the Index Update Settings API.

ElasticSearch takes care about load balancing, relocating, gathering the

results from nodes, etc. Experiment with different settings to fine-tune

your setup.

Use the Index Status API (http://localhost:9200/A/_status) to inspect

the index status.

#################################### Paths
####################################

Path to directory containing configuration (this file and logging.yml):

path.conf: /path/to/conf

Path to directory where to store index data allocated for this node.

path.data: /path/to/data

Can optionally include more than one location, causing data to be striped

across

the locations (a la RAID 0) on a file level, favouring locations with

most free

space on creation. For example:

path.data: /path/to/data1,/path/to/data2

Path to temporary files:

path.work: /path/to/work

Path to log files:

path.logs: /path/to/logs

Path to where plugins are installed:

path.plugins: /path/to/plugins

#################################### Plugin
###################################

If a plugin listed here is not installed for current node, the node will

not start.

plugin.mandatory: mapper-attachments,lang-groovy

################################### Memory
####################################

ElasticSearch performs poorly when JVM starts swapping: you should ensure

that

it never swaps.

Set this property to true to lock the memory:

bootstrap.mlockall: true

Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set

to the same value, and that the machine has enough memory to allocate

for ElasticSearch, leaving enough memory for the operating system itself.

You should also make sure that the ElasticSearch process is allowed to

lock

the memory, eg. by using ulimit -l unlimited.

############################## Network And HTTP
###############################

ElasticSearch, by default, binds itself to the 0.0.0.0 address, and

listens

on port [9200-9300] for HTTP traffic and on port [9300-9400] for

node-to-node

communication. (the range means that if the port is busy, it will

automatically

try the next port).

Set the bind address specifically (IPv4 or IPv6):

network.bind_host: 192.168.0.1

Set the address other nodes will use to communicate with this node. If not

set, it is automatically derived. It must point to an actual IP address.

network.publish_host: 192.168.0.1

Set both 'bind_host' and 'publish_host':

network.host: 192.168.0.1

Set a custom port for the node to node communication (9300 by default):

transport.tcp.port: 9300

Enable compression for all communication between nodes (disabled by

default):

transport.tcp.compress: true

Set a custom port to listen for HTTP traffic:

http.port: 9200

Set a custom allowed content length:

http.max_content_length: 100mb

Disable HTTP completely:

http.enabled: false

################################### Gateway
###################################

The gateway allows for persisting the cluster state between full cluster

restarts. Every change to the state (such as adding an index) will be

stored

in the gateway, and when the cluster starts up for the first time,

it will read its state from the gateway.

There are several types of gateway implementations. For more information,

see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html

.

The default gateway type is the "local" gateway (recommended):

gateway.type: local

Settings below control how and when to start the initial recovery process

on

a full cluster restart (to reuse as much local data as possible when

using shared

gateway).

Allow recovery process after N nodes in a cluster are up:

gateway.recover_after_nodes: 1

Set the timeout to initiate the recovery process, once the N nodes

from previous setting are up (accepts time value):

gateway.recover_after_time: 5m

Set how many nodes are expected in this cluster. Once these N nodes

are up (and recover_after_nodes is met), begin recovery process

immediately

(without waiting for recover_after_time to expire):

gateway.expected_nodes: 2

############################# Recovery Throttling
#############################

These settings allow to control the process of shards allocation between

nodes during initial recovery, replica allocation, rebalancing,

or when adding and removing nodes.

Set the number of concurrent recoveries happening on a node:

1. During the initial recovery

cluster.routing.allocation.node_initial_primaries_recoveries: 4

2. During adding/removing nodes, rebalancing, etc

cluster.routing.allocation.node_concurrent_recoveries: 2

Set to throttle throughput when recovering (eg. 100mb, by default 20mb):

indices.recovery.max_bytes_per_sec: 20mb

Set to limit the number of open concurrent streams when

recovering a shard from a peer:

indices.recovery.concurrent_streams: 5

################################## Discovery
##################################

Discovery infrastructure ensures nodes can be found within a cluster

and master node is elected. Multicast discovery is the default.

Set to ensure a node sees N other master eligible nodes to be considered

operational within the cluster. Its recommended to set it to a higher

value

than 1 when running more than 2 nodes in the cluster.

discovery.zen.minimum_master_nodes: 1

Set the time to wait for ping responses from other nodes when discovering.

Set this option to a higher value on a slow or congested network

to minimize discovery failures:

discovery.zen.ping.timeout: 3s

For more information, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

Unicast discovery allows to explicitly control which nodes will be used

to discover the cluster. It can be used when multicast is not present,

or to restrict the cluster communication-wise.

1. Disable multicast discovery (enabled by default):

discovery.zen.ping.multicast.enabled: false

2. Configure an initial list of master nodes in the cluster

to perform discovery when new nodes (master or data) are started:

discovery.zen.ping.unicast.hosts: ["10.0.0.4", "10.0.0.5", "10.0.0.6"]

EC2 discovery allows to use AWS EC2 API in order to perform discovery.

You have to install the cloud-aws plugin for enabling the EC2 discovery.

For more information, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-ec2.html

See http://elasticsearch.org/tutorials/elasticsearch-on-ec2/

for a step-by-step tutorial.

################################## Slow Log
##################################

Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms
#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms
#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms
################################## GC Logging
################################
#monitor.jvm.gc.young.warn: 1000ms
#monitor.jvm.gc.young.info: 700ms
#monitor.jvm.gc.young.debug: 400ms
#monitor.jvm.gc.old.warn: 10s
#monitor.jvm.gc.old.info: 5s
#monitor.jvm.gc.old.debug: 2s
################################# AZURE PLUGIN
###############################
cloud:
azure:
keystore: c:/Certs/certificate.pfx
password: password
subscription_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
service_name: apw-es-vms
discovery:
type: azure

Cluster 2 - brand new clean cluster with same base configuration and only
differences in elasticsearch.yml file are subscription id, service name and
cluster name. Once again none of the nodes in this configuration can see
each other.

I suspect this is a configuration issue but my experience with
elasticsearch is limited. Does anyone have any ideas what I could have
configured incorrectly?

Thanks

Andrew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/12cca085-c7db-440e-8740-0bb973ff68a5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

Could you please GIST your logs on both nodes?
Also, could you change Log level to TRACE for discovery? (See config/logging.yml file)

Thanks

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 4 févr. 2014 à 22:42, Andrew Westgarth mail@hawaythelads.co.uk a écrit :

Hi,
I read with interest the news about the Azure Cloud Plugin over the weekend and today have been trying to get it working with Windows VMs on Azure with mixed levels of success.

I have two environments/clusters one which has been running for a few weeks and another which is brand new and only been running for a couple of days; both have the head plugin installed so I can see the status of the cluster(s).

All of the clusters consist of 3 machines are using the Windows Server 2012 R2 Datacenter base image with java 7 added, and elasticsearch 0.90.10 installed as a service set to automatic startup.

Cluster 1 - been running using multicast discovery disabled and the ip addresses of the nodes listed. I have since installed the azure cloud plugin, added the certificate and configuration to the node and enabled multicast discovery again and commented out the list of ip addresses. Now when I view the details of the cluster, none of the nodes can see each other and the cluster health status is marked in amber as the full cluster is no longer available.

the elasticsearch.yml file is as follows:

##################### ElasticSearch Configuration Example #####################

This file contains an overview of various configuration settings,

targeted at operations staff. Application developers should

consult the guide at http://elasticsearch.org/guide.

The installation procedure is covered at

http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html.

ElasticSearch comes with reasonable defaults for most settings,

so you can try it out without bothering with configuration.

Most of the time, these defaults are just fine for running a production

cluster. If you're fine-tuning your cluster, or wondering about the

effect of certain configuration option, please do ask on the

mailing list or IRC channel [http://elasticsearch.org/community].

Any element in the configuration can be replaced with environment variables

by placing them in ${...} notation. For example:

node.rack: ${RACK_ENV_VAR}

For information on supported formats and syntax for the config file, see

http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html

################################### Cluster ###################################

Cluster name identifies your cluster for auto-discovery. If you're running

multiple clusters on the same network, make sure you're using unique names.

cluster.name: elasticsearch

#################################### Node #####################################

Node names are generated dynamically on startup, so you're relieved

from configuring them manually. You can tie this node to a specific name:

node.name: "Franz Kafka"

Every node can be configured to allow or deny being eligible as the master,

and to allow or deny to store the data.

Allow this node to be eligible as a master node (enabled by default):

node.master: true

Allow this node to store data (enabled by default):

node.data: true

You can exploit these settings to design advanced cluster topologies.

1. You want this node to never become a master node, only to hold data.

This will be the "workhorse" of your cluster.

node.master: false

node.data: true

2. You want this node to only serve as a master: to not store any data and

to have free resources. This will be the "coordinator" of your cluster.

node.master: true

node.data: false

3. You want this node to be neither master nor data node, but

to act as a "search load balancer" (fetching data from nodes,

aggregating results, etc.)

node.master: false

node.data: false

Use the Cluster Health API [http://localhost:9200/_cluster/health], the

Node Info API [http://localhost:9200/_cluster/nodes] or GUI tools

such as http://github.com/lukas-vlcek/bigdesk and

http://mobz.github.com/elasticsearch-head to inspect the cluster state.

A node can have generic attributes associated with it, which can later be used

for customized shard allocation filtering, or allocation awareness. An attribute

is a simple key value pair, similar to node.key: value, here is an example:

node.rack: rack314

By default, multiple nodes are allowed to start from the same installation location

to disable it, set the following:

node.max_local_storage_nodes: 1

#################################### Index ####################################

You can set a number of options (such as shard/replica options, mapping

or analyzer definitions, translog settings, ...) for indices globally,

in this file.

Note, that it makes more sense to configure index settings specifically for

a certain index, either when creating it or by using the index templates API.

See http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html and

http://elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html

for more information.

Set the number of shards (splits) of an index (5 by default):

index.number_of_shards: 5

Set the number of replicas (additional copies) of an index (1 by default):

index.number_of_replicas: 1

Note, that for development on a local machine, with small indices, it usually

makes sense to "disable" the distributed features:

index.number_of_shards: 1

index.number_of_replicas: 0

These settings directly affect the performance of index and search operations

in your cluster. Assuming you have enough machines to hold shards and

replicas, the rule of thumb is:

1. Having more shards enhances the indexing performance and allows to

distribute a big index across machines.

2. Having more replicas enhances the search performance and improves the

cluster availability.

The "number_of_shards" is a one-time setting for an index.

The "number_of_replicas" can be increased or decreased anytime,

by using the Index Update Settings API.

ElasticSearch takes care about load balancing, relocating, gathering the

results from nodes, etc. Experiment with different settings to fine-tune

your setup.

Use the Index Status API (http://localhost:9200/A/_status) to inspect

the index status.

#################################### Paths ####################################

Path to directory containing configuration (this file and logging.yml):

path.conf: /path/to/conf

Path to directory where to store index data allocated for this node.

path.data: /path/to/data

Can optionally include more than one location, causing data to be striped across

the locations (a la RAID 0) on a file level, favouring locations with most free

space on creation. For example:

path.data: /path/to/data1,/path/to/data2

Path to temporary files:

path.work: /path/to/work

Path to log files:

path.logs: /path/to/logs

Path to where plugins are installed:

path.plugins: /path/to/plugins

#################################### Plugin ###################################

If a plugin listed here is not installed for current node, the node will not start.

plugin.mandatory: mapper-attachments,lang-groovy

################################### Memory ####################################

ElasticSearch performs poorly when JVM starts swapping: you should ensure that

it never swaps.

Set this property to true to lock the memory:

bootstrap.mlockall: true

Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set

to the same value, and that the machine has enough memory to allocate

for ElasticSearch, leaving enough memory for the operating system itself.

You should also make sure that the ElasticSearch process is allowed to lock

the memory, eg. by using ulimit -l unlimited.

############################## Network And HTTP ###############################

ElasticSearch, by default, binds itself to the 0.0.0.0 address, and listens

on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node

communication. (the range means that if the port is busy, it will automatically

try the next port).

Set the bind address specifically (IPv4 or IPv6):

network.bind_host: 192.168.0.1

Set the address other nodes will use to communicate with this node. If not

set, it is automatically derived. It must point to an actual IP address.

network.publish_host: 192.168.0.1

Set both 'bind_host' and 'publish_host':

network.host: 192.168.0.1

Set a custom port for the node to node communication (9300 by default):

transport.tcp.port: 9300

Enable compression for all communication between nodes (disabled by default):

transport.tcp.compress: true

Set a custom port to listen for HTTP traffic:

http.port: 9200

Set a custom allowed content length:

http.max_content_length: 100mb

Disable HTTP completely:

http.enabled: false

################################### Gateway ###################################

The gateway allows for persisting the cluster state between full cluster

restarts. Every change to the state (such as adding an index) will be stored

in the gateway, and when the cluster starts up for the first time,

it will read its state from the gateway.

There are several types of gateway implementations. For more information, see

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html.

The default gateway type is the "local" gateway (recommended):

gateway.type: local

Settings below control how and when to start the initial recovery process on

a full cluster restart (to reuse as much local data as possible when using shared

gateway).

Allow recovery process after N nodes in a cluster are up:

gateway.recover_after_nodes: 1

Set the timeout to initiate the recovery process, once the N nodes

from previous setting are up (accepts time value):

gateway.recover_after_time: 5m

Set how many nodes are expected in this cluster. Once these N nodes

are up (and recover_after_nodes is met), begin recovery process immediately

(without waiting for recover_after_time to expire):

gateway.expected_nodes: 2

############################# Recovery Throttling #############################

These settings allow to control the process of shards allocation between

nodes during initial recovery, replica allocation, rebalancing,

or when adding and removing nodes.

Set the number of concurrent recoveries happening on a node:

1. During the initial recovery

cluster.routing.allocation.node_initial_primaries_recoveries: 4

2. During adding/removing nodes, rebalancing, etc

cluster.routing.allocation.node_concurrent_recoveries: 2

Set to throttle throughput when recovering (eg. 100mb, by default 20mb):

indices.recovery.max_bytes_per_sec: 20mb

Set to limit the number of open concurrent streams when

recovering a shard from a peer:

indices.recovery.concurrent_streams: 5

################################## Discovery ##################################

Discovery infrastructure ensures nodes can be found within a cluster

and master node is elected. Multicast discovery is the default.

Set to ensure a node sees N other master eligible nodes to be considered

operational within the cluster. Its recommended to set it to a higher value

than 1 when running more than 2 nodes in the cluster.

discovery.zen.minimum_master_nodes: 1

Set the time to wait for ping responses from other nodes when discovering.

Set this option to a higher value on a slow or congested network

to minimize discovery failures:

discovery.zen.ping.timeout: 3s

For more information, see

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

Unicast discovery allows to explicitly control which nodes will be used

to discover the cluster. It can be used when multicast is not present,

or to restrict the cluster communication-wise.

1. Disable multicast discovery (enabled by default):

discovery.zen.ping.multicast.enabled: false

2. Configure an initial list of master nodes in the cluster

to perform discovery when new nodes (master or data) are started:

discovery.zen.ping.unicast.hosts: ["10.0.0.4", "10.0.0.5", "10.0.0.6"]

EC2 discovery allows to use AWS EC2 API in order to perform discovery.

You have to install the cloud-aws plugin for enabling the EC2 discovery.

For more information, see

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-ec2.html

See http://elasticsearch.org/tutorials/elasticsearch-on-ec2/

for a step-by-step tutorial.

################################## Slow Log ##################################

Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms
#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms
#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms
################################## GC Logging ################################
#monitor.jvm.gc.young.warn: 1000ms
#monitor.jvm.gc.young.info: 700ms
#monitor.jvm.gc.young.debug: 400ms
#monitor.jvm.gc.old.warn: 10s
#monitor.jvm.gc.old.info: 5s
#monitor.jvm.gc.old.debug: 2s
################################# AZURE PLUGIN ###############################
cloud:
azure:
keystore: c:/Certs/certificate.pfx
password: password
subscription_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
service_name: apw-es-vms
discovery:
type: azure

Cluster 2 - brand new clean cluster with same base configuration and only differences in elasticsearch.yml file are subscription id, service name and cluster name. Once again none of the nodes in this configuration can see each other.

I suspect this is a configuration issue but my experience with elasticsearch is limited. Does anyone have any ideas what I could have configured incorrectly?

Thanks

Andrew

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/12cca085-c7db-440e-8740-0bb973ff68a5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/158117B5-996D-45D9-8BD5-8B9F1DB4D6B0%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(Andrew Westgarth) #3

Hi David,
here's the gist for the logs from the three nodes of Cluster 2

  • https://gist.github.com/apwestgarth/8813941 first thing I noticed which
    is strange is node 1 is referring to the cluster as sageerpdev_escluster
    whereas node 2 and 3 are correctly referring to it as sageerpdevescluster.
    The config files (elasticsearch,yml) are the same on each node :s so not
    sure why that's happening.

I've since reverted cluster 1 back to unicast mode so I can carry on
working with the old environment.

Thanks

Andrew

On Tuesday, 4 February 2014 22:22:49 UTC, David Pilato wrote:

Could you please GIST your logs on both nodes?
Also, could you change Log level to TRACE for discovery? (See
config/logging.yml file)

Thanks

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 4 févr. 2014 à 22:42, Andrew Westgarth <ma...@hawaythelads.co.uk<javascript:>>
a écrit :

Hi,
I read with interest the news about the Azure Cloud Plugin over the
weekend and today have been trying to get it working with Windows VMs on
Azure with mixed levels of success.

I have two environments/clusters one which has been running for a few
weeks and another which is brand new and only been running for a couple of
days; both have the head plugin installed so I can see the status of the
cluster(s).

All of the clusters consist of 3 machines are using the Windows Server
2012 R2 Datacenter base image with java 7 added, and elasticsearch 0.90.10
installed as a service set to automatic startup.

Cluster 1 - been running using multicast discovery disabled and the ip
addresses of the nodes listed. I have since installed the azure cloud
plugin, added the certificate and configuration to the node and enabled
multicast discovery again and commented out the list of ip addresses. Now
when I view the details of the cluster, none of the nodes can see each
other and the cluster health status is marked in amber as the full cluster
is no longer available.

the elasticsearch.yml file is as follows:

##################### ElasticSearch Configuration Example
#####################

This file contains an overview of various configuration settings,

targeted at operations staff. Application developers should

consult the guide at http://elasticsearch.org/guide.

The installation procedure is covered at

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html

.

ElasticSearch comes with reasonable defaults for most settings,

so you can try it out without bothering with configuration.

Most of the time, these defaults are just fine for running a production

cluster. If you're fine-tuning your cluster, or wondering about the

effect of certain configuration option, please do ask on the

mailing list or IRC channel [http://elasticsearch.org/community].

Any element in the configuration can be replaced with environment

variables

by placing them in ${...} notation. For example:

node.rack: ${RACK_ENV_VAR}

For information on supported formats and syntax for the config file, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html

################################### Cluster
###################################

Cluster name identifies your cluster for auto-discovery. If you're

running

multiple clusters on the same network, make sure you're using unique

names.

cluster.name: elasticsearch

#################################### Node
#####################################

Node names are generated dynamically on startup, so you're relieved

from configuring them manually. You can tie this node to a specific name:

node.name: "Franz Kafka"

Every node can be configured to allow or deny being eligible as the

master,

and to allow or deny to store the data.

Allow this node to be eligible as a master node (enabled by default):

node.master: true

Allow this node to store data (enabled by default):

node.data: true

You can exploit these settings to design advanced cluster topologies.

1. You want this node to never become a master node, only to hold data.

This will be the "workhorse" of your cluster.

node.master: false

node.data: true

2. You want this node to only serve as a master: to not store any data

and

to have free resources. This will be the "coordinator" of your

cluster.

node.master: true

node.data: false

3. You want this node to be neither master nor data node, but

to act as a "search load balancer" (fetching data from nodes,

aggregating results, etc.)

node.master: false

node.data: false

Use the Cluster Health API [http://localhost:9200/_cluster/health], the

Node Info API [http://localhost:9200/_cluster/nodes] or GUI tools

such as http://github.com/lukas-vlcek/bigdesk and

http://mobz.github.com/elasticsearch-head to inspect the cluster

state.

A node can have generic attributes associated with it, which can later

be used

for customized shard allocation filtering, or allocation awareness. An

attribute

is a simple key value pair, similar to node.key: value, here is an

example:

node.rack: rack314

By default, multiple nodes are allowed to start from the same

installation location

to disable it, set the following:

node.max_local_storage_nodes: 1

#################################### Index
####################################

You can set a number of options (such as shard/replica options, mapping

or analyzer definitions, translog settings, ...) for indices globally,

in this file.

Note, that it makes more sense to configure index settings specifically

for

a certain index, either when creating it or by using the index templates

API.

See <

http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html>
and

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html

for more information.

Set the number of shards (splits) of an index (5 by default):

index.number_of_shards: 5

Set the number of replicas (additional copies) of an index (1 by

default):

index.number_of_replicas: 1

Note, that for development on a local machine, with small indices, it

usually

makes sense to "disable" the distributed features:

index.number_of_shards: 1

index.number_of_replicas: 0

These settings directly affect the performance of index and search

operations

in your cluster. Assuming you have enough machines to hold shards and

replicas, the rule of thumb is:

1. Having more shards enhances the indexing performance and allows to

distribute a big index across machines.

2. Having more replicas enhances the search performance and improves

the

cluster availability.

The "number_of_shards" is a one-time setting for an index.

The "number_of_replicas" can be increased or decreased anytime,

by using the Index Update Settings API.

ElasticSearch takes care about load balancing, relocating, gathering the

results from nodes, etc. Experiment with different settings to fine-tune

your setup.

Use the Index Status API (http://localhost:9200/A/_status) to inspect

the index status.

#################################### Paths
####################################

Path to directory containing configuration (this file and logging.yml):

path.conf: /path/to/conf

Path to directory where to store index data allocated for this node.

path.data: /path/to/data

Can optionally include more than one location, causing data to be

striped across

the locations (a la RAID 0) on a file level, favouring locations with

most free

space on creation. For example:

path.data: /path/to/data1,/path/to/data2

Path to temporary files:

path.work: /path/to/work

Path to log files:

path.logs: /path/to/logs

Path to where plugins are installed:

path.plugins: /path/to/plugins

#################################### Plugin
###################################

If a plugin listed here is not installed for current node, the node will

not start.

plugin.mandatory: mapper-attachments,lang-groovy

################################### Memory
####################################

ElasticSearch performs poorly when JVM starts swapping: you should

ensure that

it never swaps.

Set this property to true to lock the memory:

bootstrap.mlockall: true

Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are

set

to the same value, and that the machine has enough memory to allocate

for ElasticSearch, leaving enough memory for the operating system itself.

You should also make sure that the ElasticSearch process is allowed to

lock

the memory, eg. by using ulimit -l unlimited.

############################## Network And HTTP
###############################

ElasticSearch, by default, binds itself to the 0.0.0.0 address, and

listens

on port [9200-9300] for HTTP traffic and on port [9300-9400] for

node-to-node

communication. (the range means that if the port is busy, it will

automatically

try the next port).

Set the bind address specifically (IPv4 or IPv6):

network.bind_host: 192.168.0.1

Set the address other nodes will use to communicate with this node. If

not

set, it is automatically derived. It must point to an actual IP address.

network.publish_host: 192.168.0.1

Set both 'bind_host' and 'publish_host':

network.host: 192.168.0.1

Set a custom port for the node to node communication (9300 by default):

transport.tcp.port: 9300

Enable compression for all communication between nodes (disabled by

default):

transport.tcp.compress: true

Set a custom port to listen for HTTP traffic:

http.port: 9200

Set a custom allowed content length:

http.max_content_length: 100mb

Disable HTTP completely:

http.enabled: false

################################### Gateway
###################################

The gateway allows for persisting the cluster state between full cluster

restarts. Every change to the state (such as adding an index) will be

stored

in the gateway, and when the cluster starts up for the first time,

it will read its state from the gateway.

There are several types of gateway implementations. For more

information, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html

.

The default gateway type is the "local" gateway (recommended):

gateway.type: local

Settings below control how and when to start the initial recovery

process on

a full cluster restart (to reuse as much local data as possible when

using shared

gateway).

Allow recovery process after N nodes in a cluster are up:

gateway.recover_after_nodes: 1

Set the timeout to initiate the recovery process, once the N nodes

from previous setting are up (accepts time value):

gateway.recover_after_time: 5m

Set how many nodes are expected in this cluster. Once these N nodes

are up (and recover_after_nodes is met), begin recovery process

immediately

(without waiting for recover_after_time to expire):

gateway.expected_nodes: 2

############################# Recovery Throttling
#############################

These settings allow to control the process of shards allocation between

nodes during initial recovery, replica allocation, rebalancing,

or when adding and removing nodes.

Set the number of concurrent recoveries happening on a node:

1. During the initial recovery

cluster.routing.allocation.node_initial_primaries_recoveries: 4

2. During adding/removing nodes, rebalancing, etc

cluster.routing.allocation.node_concurrent_recoveries: 2

Set to throttle throughput when recovering (eg. 100mb, by default 20mb):

indices.recovery.max_bytes_per_sec: 20mb

Set to limit the number of open concurrent streams when

recovering a shard from a peer:

indices.recovery.concurrent_streams: 5

################################## Discovery
##################################

Discovery infrastructure ensures nodes can be found within a cluster

and master node is elected. Multicast discovery is the default.

Set to ensure a node sees N other master eligible nodes to be considered

operational within the cluster. Its recommended to set it to a higher

value

than 1 when running more than 2 nodes in the cluster.

discovery.zen.minimum_master_nodes: 1

Set the time to wait for ping responses from other nodes when

discovering.

Set this option to a higher value on a slow or congested network

to minimize discovery failures:

discovery.zen.ping.timeout: 3s

For more information, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

Unicast discovery allows to explicitly control which nodes will be used

to discover the cluster. It can be used when multicast is not present,

or to restrict the cluster communication-wise.

1. Disable multicast discovery (enabled by default):

discovery.zen.ping.multicast.enabled: false

2. Configure an initial list of master nodes in the cluster

to perform discovery when new nodes (master or data) are started:

discovery.zen.ping.unicast.hosts: ["10.0.0.4", "10.0.0.5", "10.0.0.6"]

EC2 discovery allows to use AWS EC2 API in order to perform discovery.

You have to install the cloud-aws plugin for enabling the EC2 discovery.

For more information, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-ec2.html

See http://elasticsearch.org/tutorials/elasticsearch-on-ec2/

for a step-by-step tutorial.

################################## Slow Log
##################################

Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms
#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms
#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms
################################## GC Logging
################################
#monitor.jvm.gc.young.warn: 1000ms
#monitor.jvm.gc.young.info: 700ms
#monitor.jvm.gc.young.debug: 400ms
#monitor.jvm.gc.old.warn: 10s
#monitor.jvm.gc.old.info: 5s
#monitor.jvm.gc.old.debug: 2s
################################# AZURE PLUGIN
###############################
cloud:
azure:
keystore: c:/Certs/certificate.pfx
password: password
subscription_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
service_name: apw-es-vms
discovery:
type: azure

Cluster 2 - brand new clean cluster with same base configuration and only
differences in elasticsearch.yml file are subscription id, service name and
cluster name. Once again none of the nodes in this configuration can see
each other.

I suspect this is a configuration issue but my experience with
elasticsearch is limited. Does anyone have any ideas what I could have
configured incorrectly?

Thanks

Andrew

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/12cca085-c7db-440e-8740-0bb973ff68a5%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f322a3b4-733f-4718-a051-0654e4d38ab2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #4

So Azure plugin actually does not start but only multicast discovery.

I think it's because your elasticsearch.yml is not correct. Probably because a line is missing between cloud and discovery parts.
Also indentation can play a role here I guess (discovery should be indented at the same level as cloud and not inside cloud).

cloud:
azure:
keystore: c:/Certs/certificate.pfx
password: password
subscription_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
service_name: apw-es-vms
discovery:
type: azure

I created a GIST here of something which should work: https://gist.github.com/dadoonet/8819191

Hope this helps

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 4 février 2014 at 23:47:39, Andrew Westgarth (mail@hawaythelads.co.uk) a écrit:

Hi David,
here's the gist for the logs from the three nodes of Cluster 2 - https://gist.github.com/apwestgarth/8813941 first thing I noticed which is strange is node 1 is referring to the cluster as sageerpdev_escluster whereas node 2 and 3 are correctly referring to it as sageerpdevescluster. The config files (elasticsearch,yml) are the same on each node :s so not sure why that's happening.

I've since reverted cluster 1 back to unicast mode so I can carry on working with the old environment.

Thanks

Andrew

On Tuesday, 4 February 2014 22:22:49 UTC, David Pilato wrote:
Could you please GIST your logs on both nodes?
Also, could you change Log level to TRACE for discovery? (See config/logging.yml file)

Thanks

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 4 févr. 2014 à 22:42, Andrew Westgarth ma...@hawaythelads.co.uk a écrit :

Hi,
I read with interest the news about the Azure Cloud Plugin over the weekend and today have been trying to get it working with Windows VMs on Azure with mixed levels of success.

I have two environments/clusters one which has been running for a few weeks and another which is brand new and only been running for a couple of days; both have the head plugin installed so I can see the status of the cluster(s).

All of the clusters consist of 3 machines are using the Windows Server 2012 R2 Datacenter base image with java 7 added, and elasticsearch 0.90.10 installed as a service set to automatic startup.

Cluster 1 - been running using multicast discovery disabled and the ip addresses of the nodes listed. I have since installed the azure cloud plugin, added the certificate and configuration to the node and enabled multicast discovery again and commented out the list of ip addresses. Now when I view the details of the cluster, none of the nodes can see each other and the cluster health status is marked in amber as the full cluster is no longer available.

the elasticsearch.yml file is as follows:

##################### ElasticSearch Configuration Example #####################

This file contains an overview of various configuration settings,

targeted at operations staff. Application developers should

consult the guide at http://elasticsearch.org/guide.

The installation procedure is covered at

http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html.

ElasticSearch comes with reasonable defaults for most settings,

so you can try it out without bothering with configuration.

Most of the time, these defaults are just fine for running a production

cluster. If you're fine-tuning your cluster, or wondering about the

effect of certain configuration option, please do ask on the

mailing list or IRC channel [http://elasticsearch.org/community].

Any element in the configuration can be replaced with environment variables

by placing them in ${...} notation. For example:

node.rack: ${RACK_ENV_VAR}

For information on supported formats and syntax for the config file, see

http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html

################################### Cluster ###################################

Cluster name identifies your cluster for auto-discovery. If you're running

multiple clusters on the same network, make sure you're using unique names.

cluster.name: elasticsearch

#################################### Node #####################################

Node names are generated dynamically on startup, so you're relieved

from configuring them manually. You can tie this node to a specific name:

node.name: "Franz Kafka"

Every node can be configured to allow or deny being eligible as the master,

and to allow or deny to store the data.

Allow this node to be eligible as a master node (enabled by default):

node.master: true

Allow this node to store data (enabled by default):

node.data: true

You can exploit these settings to design advanced cluster topologies.

1. You want this node to never become a master node, only to hold data.

This will be the "workhorse" of your cluster.

node.master: false

node.data: true

2. You want this node to only serve as a master: to not store any data and

to have free resources. This will be the "coordinator" of your cluster.

node.master: true

node.data: false

3. You want this node to be neither master nor data node, but

to act as a "search load balancer" (fetching data from nodes,

aggregating results, etc.)

node.master: false

node.data: false

Use the Cluster Health API [http://localhost:9200/_cluster/health], the

Node Info API [http://localhost:9200/_cluster/nodes] or GUI tools

such as http://github.com/lukas-vlcek/bigdesk and

http://mobz.github.com/elasticsearch-head to inspect the cluster state.

A node can have generic attributes associated with it, which can later be used

for customized shard allocation filtering, or allocation awareness. An attribute

is a simple key value pair, similar to node.key: value, here is an example:

node.rack: rack314

By default, multiple nodes are allowed to start from the same installation location

to disable it, set the following:

node.max_local_storage_nodes: 1

#################################### Index ####################################

You can set a number of options (such as shard/replica options, mapping

or analyzer definitions, translog settings, ...) for indices globally,

in this file.

Note, that it makes more sense to configure index settings specifically for

a certain index, either when creating it or by using the index templates API.

See http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html and

http://elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html

for more information.

Set the number of shards (splits) of an index (5 by default):

index.number_of_shards: 5

Set the number of replicas (additional copies) of an index (1 by default):

index.number_of_replicas: 1

Note, that for development on a local machine, with small indices, it usually

makes sense to "disable" the distributed features:

index.number_of_shards: 1

index.number_of_replicas: 0

These settings directly affect the performance of index and search operations

in your cluster. Assuming you have enough machines to hold shards and

replicas, the rule of thumb is:

1. Having more shards enhances the indexing performance and allows to

distribute a big index across machines.

2. Having more replicas enhances the search performance and improves the

cluster availability.

The "number_of_shards" is a one-time setting for an index.

The "number_of_replicas" can be increased or decreased anytime,

by using the Index Update Settings API.

ElasticSearch takes care about load balancing, relocating, gathering the

results from nodes, etc. Experiment with different settings to fine-tune

your setup.

Use the Index Status API (http://localhost:9200/A/_status) to inspect

the index status.

#################################### Paths ####################################

Path to directory containing configuration (this file and logging.yml):

path.conf: /path/to/conf

Path to directory where to store index data allocated for this node.

path.data: /path/to/data

Can optionally include more than one location, causing data to be striped across

the locations (a la RAID 0) on a file level, favouring locations with most free

space on creation. For example:

path.data: /path/to/data1,/path/to/data2

Path to temporary files:

path.work: /path/to/work

Path to log files:

path.logs: /path/to/logs

Path to where plugins are installed:

path.plugins: /path/to/plugins

#################################### Plugin ###################################

If a plugin listed here is not installed for current node, the node will not start.

plugin.mandatory: mapper-attachments,lang-groovy

################################### Memory ####################################

ElasticSearch performs poorly when JVM starts swapping: you should ensure that

it never swaps.

Set this property to true to lock the memory:

bootstrap.mlockall: true

Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set

to the same value, and that the machine has enough memory to allocate

for ElasticSearch, leaving enough memory for the operating system itself.

You should also make sure that the ElasticSearch process is allowed to lock

the memory, eg. by using ulimit -l unlimited.

############################## Network And HTTP ###############################

ElasticSearch, by default, binds itself to the 0.0.0.0 address, and listens

on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node

communication. (the range means that if the port is busy, it will automatically

try the next port).

Set the bind address specifically (IPv4 or IPv6):

network.bind_host: 192.168.0.1

Set the address other nodes will use to communicate with this node. If not

set, it is automatically derived. It must point to an actual IP address.

network.publish_host: 192.168.0.1

Set both 'bind_host' and 'publish_host':

network.host: 192.168.0.1

Set a custom port for the node to node communication (9300 by default):

transport.tcp.port: 9300

Enable compression for all communication between nodes (disabled by default):

transport.tcp.compress: true

Set a custom port to listen for HTTP traffic:

http.port: 9200

Set a custom allowed content length:

http.max_content_length: 100mb

Disable HTTP completely:

http.enabled: false

################################### Gateway ###################################

The gateway allows for persisting the cluster state between full cluster

restarts. Every change to the state (such as adding an index) will be stored

in the gateway, and when the cluster starts up for the first time,

it will read its state from the gateway.

There are several types of gateway implementations. For more information, see

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html.

The default gateway type is the "local" gateway (recommended):

gateway.type: local

Settings below control how and when to start the initial recovery process on

a full cluster restart (to reuse as much local data as possible when using shared

gateway).

Allow recovery process after N nodes in a cluster are up:

gateway.recover_after_nodes: 1

Set the timeout to initiate the recovery process, once the N nodes

from previous setting are up (accepts time value):

gateway.recover_after_time: 5m

Set how many nodes are expected in this cluster. Once these N nodes

are up (and recover_after_nodes is met), begin recovery process immediately

(without waiting for recover_after_time to expire):

gateway.expected_nodes: 2

############################# Recovery Throttling #############################

These settings allow to control the process of shards allocation between

nodes during initial recovery, replica allocation, rebalancing,

or when adding and removing nodes.

Set the number of concurrent recoveries happening on a node:

1. During the initial recovery

cluster.routing.allocation.node_initial_primaries_recoveries: 4

2. During adding/removing nodes, rebalancing, etc

cluster.routing.allocation.node_concurrent_recoveries: 2

Set to throttle throughput when recovering (eg. 100mb, by default 20mb):

indices.recovery.max_bytes_per_sec: 20mb

Set to limit the number of open concurrent streams when

recovering a shard from a peer:

indices.recovery.concurrent_streams: 5

################################## Discovery ##################################

Discovery infrastructure ensures nodes can be found within a cluster

and master node is elected. Multicast discovery is the default.

Set to ensure a node sees N other master eligible nodes to be considered

operational within the cluster. Its recommended to set it to a higher value

than 1 when running more than 2 nodes in the cluster.

discovery.zen.minimum_master_nodes: 1

Set the time to wait for ping responses from other nodes when discovering.

Set this option to a higher value on a slow or congested network

to minimize discovery failures:

discovery.zen.ping.timeout: 3s

For more information, see

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

Unicast discovery allows to explicitly control which nodes will be used

to discover the cluster. It can be used when multicast is not present,

or to restrict the cluster communication-wise.

1. Disable multicast discovery (enabled by default):

discovery.zen.ping.multicast.enabled: false

2. Configure an initial list of master nodes in the cluster

to perform discovery when new nodes (master or data) are started:

discovery.zen.ping.unicast.hosts: ["10.0.0.4", "10.0.0.5", "10.0.0.6"]

EC2 discovery allows to use AWS EC2 API in order to perform discovery.

You have to install the cloud-aws plugin for enabling the EC2 discovery.

For more information, see

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-ec2.html

See http://elasticsearch.org/tutorials/elasticsearch-on-ec2/

for a step-by-step tutorial.

################################## Slow Log ##################################

Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms
#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms
#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms
################################## GC Logging ################################
#monitor.jvm.gc.young.warn: 1000ms
#monitor.jvm.gc.young.info: 700ms
#monitor.jvm.gc.young.debug: 400ms
#monitor.jvm.gc.old.warn: 10s
#monitor.jvm.gc.old.info: 5s
#monitor.jvm.gc.old.debug: 2s
################################# AZURE PLUGIN ###############################
cloud:
azure:
keystore: c:/Certs/certificate.pfx
password: password
subscription_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
service_name: apw-es-vms
discovery:
type: azure

Cluster 2 - brand new clean cluster with same base configuration and only differences in elasticsearch.yml file are subscription id, service name and cluster name. Once again none of the nodes in this configuration can see each other.

I suspect this is a configuration issue but my experience with elasticsearch is limited. Does anyone have any ideas what I could have configured incorrectly?

Thanks

Andrew

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/12cca085-c7db-440e-8740-0bb973ff68a5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f322a3b4-733f-4718-a051-0654e4d38ab2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52f1f3b0.3f2dba31.d955%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(Andrew Westgarth) #5

Hi David,
thanks for taking a look and creating the new GIST. I've
uploaded an updated elasticsearch.yml file and all are now being seen by
each other! :slight_smile: I also resolved the issue on node 1 as it had two config
files, one in the root which was overriding one in the config directory.

One thing I'm learning about Elasticsearch over the last month is that the
yml files are very particular about format and it's very easy to get it
wrong despite using tooling (I was using a visual studio plugin and am now
using Notepad++ and still didn't see the issues)

Thanks for your help, I'll update my draft blog post and publish this
morning.

Thanks

Andrew

On Wednesday, 5 February 2014 08:17:52 UTC, David Pilato wrote:

So Azure plugin actually does not start but only multicast discovery.

I think it's because your elasticsearch.yml is not correct. Probably
because a line is missing between cloud and discovery parts.
Also indentation can play a role here I guess (discovery should be
indented at the same level as cloud and not inside cloud).

cloud:
azure:
keystore: c:/Certs/certificate.pfx
password: password
subscription_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
service_name: apw-es-vms
discovery:
type: azure

I created a GIST here of something which should work:
https://gist.github.com/dadoonet/8819191

Hope this helps

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 4 février 2014 at 23:47:39, Andrew Westgarth (ma...@hawaythelads.co.uk<javascript:>)
a écrit:

Hi David,
here's the gist for the logs from the three nodes of Cluster 2

  • https://gist.github.com/apwestgarth/8813941 first thing I noticed which
    is strange is node 1 is referring to the cluster as sageerpdev_escluster
    whereas node 2 and 3 are correctly referring to it as sageerpdevescluster.
    The config files (elasticsearch,yml) are the same on each node :s so not
    sure why that's happening.

I've since reverted cluster 1 back to unicast mode so I can carry on
working with the old environment.

Thanks

Andrew

On Tuesday, 4 February 2014 22:22:49 UTC, David Pilato wrote:

Could you please GIST your logs on both nodes?
Also, could you change Log level to TRACE for discovery? (See
config/logging.yml file)

Thanks

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 4 févr. 2014 à 22:42, Andrew Westgarth ma...@hawaythelads.co.uk a
écrit :

Hi,
I read with interest the news about the Azure Cloud Plugin over the
weekend and today have been trying to get it working with Windows VMs on
Azure with mixed levels of success.

I have two environments/clusters one which has been running for a few
weeks and another which is brand new and only been running for a couple of
days; both have the head plugin installed so I can see the status of the
cluster(s).

All of the clusters consist of 3 machines are using the Windows Server
2012 R2 Datacenter base image with java 7 added, and elasticsearch 0.90.10
installed as a service set to automatic startup.

Cluster 1 - been running using multicast discovery disabled and the ip
addresses of the nodes listed. I have since installed the azure cloud
plugin, added the certificate and configuration to the node and enabled
multicast discovery again and commented out the list of ip addresses. Now
when I view the details of the cluster, none of the nodes can see each
other and the cluster health status is marked in amber as the full cluster
is no longer available.

the elasticsearch.yml file is as follows:

##################### ElasticSearch Configuration Example
#####################

This file contains an overview of various configuration settings,

targeted at operations staff. Application developers should

consult the guide at http://elasticsearch.org/guide.

The installation procedure is covered at

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html

.

ElasticSearch comes with reasonable defaults for most settings,

so you can try it out without bothering with configuration.

Most of the time, these defaults are just fine for running a production

cluster. If you're fine-tuning your cluster, or wondering about the

effect of certain configuration option, please do ask on the

mailing list or IRC channel [http://elasticsearch.org/community].

Any element in the configuration can be replaced with environment

variables

by placing them in ${...} notation. For example:

node.rack: ${RACK_ENV_VAR}

For information on supported formats and syntax for the config file, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html

################################### Cluster
###################################

Cluster name identifies your cluster for auto-discovery. If you're

running

multiple clusters on the same network, make sure you're using unique

names.

cluster.name: elasticsearch

#################################### Node
#####################################

Node names are generated dynamically on startup, so you're relieved

from configuring them manually. You can tie this node to a specific

name:

node.name: "Franz Kafka"

Every node can be configured to allow or deny being eligible as the

master,

and to allow or deny to store the data.

Allow this node to be eligible as a master node (enabled by default):

node.master: true

Allow this node to store data (enabled by default):

node.data: true

You can exploit these settings to design advanced cluster topologies.

1. You want this node to never become a master node, only to hold data.

This will be the "workhorse" of your cluster.

node.master: false

node.data: true

2. You want this node to only serve as a master: to not store any data

and

to have free resources. This will be the "coordinator" of your

cluster.

node.master: true

node.data: false

3. You want this node to be neither master nor data node, but

to act as a "search load balancer" (fetching data from nodes,

aggregating results, etc.)

node.master: false

node.data: false

Use the Cluster Health API [http://localhost:9200/_cluster/health], the

Node Info API [http://localhost:9200/_cluster/nodes] or GUI tools

such as http://github.com/lukas-vlcek/bigdesk and

http://mobz.github.com/elasticsearch-head to inspect the cluster

state.

A node can have generic attributes associated with it, which can later

be used

for customized shard allocation filtering, or allocation awareness. An

attribute

is a simple key value pair, similar to node.key: value, here is an

example:

node.rack: rack314

By default, multiple nodes are allowed to start from the same

installation location

to disable it, set the following:

node.max_local_storage_nodes: 1

#################################### Index
####################################

You can set a number of options (such as shard/replica options, mapping

or analyzer definitions, translog settings, ...) for indices globally,

in this file.

Note, that it makes more sense to configure index settings specifically

for

a certain index, either when creating it or by using the index

templates API.

See <

http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html>
and

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html

for more information.

Set the number of shards (splits) of an index (5 by default):

index.number_of_shards: 5

Set the number of replicas (additional copies) of an index (1 by

default):

index.number_of_replicas: 1

Note, that for development on a local machine, with small indices, it

usually

makes sense to "disable" the distributed features:

index.number_of_shards: 1

index.number_of_replicas: 0

These settings directly affect the performance of index and search

operations

in your cluster. Assuming you have enough machines to hold shards and

replicas, the rule of thumb is:

1. Having more shards enhances the indexing performance and allows

to

distribute a big index across machines.

2. Having more replicas enhances the search performance and

improves the

cluster availability.

The "number_of_shards" is a one-time setting for an index.

The "number_of_replicas" can be increased or decreased anytime,

by using the Index Update Settings API.

ElasticSearch takes care about load balancing, relocating, gathering the

results from nodes, etc. Experiment with different settings to fine-tune

your setup.

Use the Index Status API (http://localhost:9200/A/_status) to inspect

the index status.

#################################### Paths
####################################

Path to directory containing configuration (this file and logging.yml):

path.conf: /path/to/conf

Path to directory where to store index data allocated for this node.

path.data: /path/to/data

Can optionally include more than one location, causing data to be

striped across

the locations (a la RAID 0) on a file level, favouring locations with

most free

space on creation. For example:

path.data: /path/to/data1,/path/to/data2

Path to temporary files:

path.work: /path/to/work

Path to log files:

path.logs: /path/to/logs

Path to where plugins are installed:

path.plugins: /path/to/plugins

#################################### Plugin
###################################

If a plugin listed here is not installed for current node, the node

will not start.

plugin.mandatory: mapper-attachments,lang-groovy

################################### Memory
####################################

ElasticSearch performs poorly when JVM starts swapping: you should

ensure that

it never swaps.

Set this property to true to lock the memory:

bootstrap.mlockall: true

Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are

set

to the same value, and that the machine has enough memory to allocate

for ElasticSearch, leaving enough memory for the operating system

itself.

You should also make sure that the ElasticSearch process is allowed to

lock

the memory, eg. by using ulimit -l unlimited.

############################## Network And HTTP
###############################

ElasticSearch, by default, binds itself to the 0.0.0.0 address, and

listens

on port [9200-9300] for HTTP traffic and on port [9300-9400] for

node-to-node

communication. (the range means that if the port is busy, it will

automatically

try the next port).

Set the bind address specifically (IPv4 or IPv6):

network.bind_host: 192.168.0.1

Set the address other nodes will use to communicate with this node. If

not

set, it is automatically derived. It must point to an actual IP address.

network.publish_host: 192.168.0.1

Set both 'bind_host' and 'publish_host':

network.host: 192.168.0.1

Set a custom port for the node to node communication (9300 by default):

transport.tcp.port: 9300

Enable compression for all communication between nodes (disabled by

default):

transport.tcp.compress: true

Set a custom port to listen for HTTP traffic:

http.port: 9200

Set a custom allowed content length:

http.max_content_length: 100mb

Disable HTTP completely:

http.enabled: false

################################### Gateway
###################################

The gateway allows for persisting the cluster state between full cluster

restarts. Every change to the state (such as adding an index) will be

stored

in the gateway, and when the cluster starts up for the first time,

it will read its state from the gateway.

There are several types of gateway implementations. For more

information, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html

.

The default gateway type is the "local" gateway (recommended):

gateway.type: local

Settings below control how and when to start the initial recovery

process on

a full cluster restart (to reuse as much local data as possible when

using shared

gateway).

Allow recovery process after N nodes in a cluster are up:

gateway.recover_after_nodes: 1

Set the timeout to initiate the recovery process, once the N nodes

from previous setting are up (accepts time value):

gateway.recover_after_time: 5m

Set how many nodes are expected in this cluster. Once these N nodes

are up (and recover_after_nodes is met), begin recovery process

immediately

(without waiting for recover_after_time to expire):

gateway.expected_nodes: 2

############################# Recovery Throttling
#############################

These settings allow to control the process of shards allocation between

nodes during initial recovery, replica allocation, rebalancing,

or when adding and removing nodes.

Set the number of concurrent recoveries happening on a node:

1. During the initial recovery

cluster.routing.allocation.node_initial_primaries_recoveries: 4

2. During adding/removing nodes, rebalancing, etc

cluster.routing.allocation.node_concurrent_recoveries: 2

Set to throttle throughput when recovering (eg. 100mb, by default 20mb):

indices.recovery.max_bytes_per_sec: 20mb

Set to limit the number of open concurrent streams when

recovering a shard from a peer:

indices.recovery.concurrent_streams: 5

################################## Discovery
##################################

Discovery infrastructure ensures nodes can be found within a cluster

and master node is elected. Multicast discovery is the default.

Set to ensure a node sees N other master eligible nodes to be considered

operational within the cluster. Its recommended to set it to a higher

value

than 1 when running more than 2 nodes in the cluster.

discovery.zen.minimum_master_nodes: 1

Set the time to wait for ping responses from other nodes when

discovering.

Set this option to a higher value on a slow or congested network

to minimize discovery failures:

discovery.zen.ping.timeout: 3s

For more information, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

Unicast discovery allows to explicitly control which nodes will be used

to discover the cluster. It can be used when multicast is not present,

or to restrict the cluster communication-wise.

1. Disable multicast discovery (enabled by default):

discovery.zen.ping.multicast.enabled: false

2. Configure an initial list of master nodes in the cluster

to perform discovery when new nodes (master or data) are started:

discovery.zen.ping.unicast.hosts: ["10.0.0.4", "10.0.0.5", "10.0.0.6"]

EC2 discovery allows to use AWS EC2 API in order to perform discovery.

You have to install the cloud-aws plugin for enabling the EC2 discovery.

For more information, see

<

http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-ec2.html

See http://elasticsearch.org/tutorials/elasticsearch-on-ec2/

for a step-by-step tutorial.

################################## Slow Log
##################################

Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms
#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms
#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms
################################## GC Logging
################################
#monitor.jvm.gc.young.warn: 1000ms
#monitor.jvm.gc.young.info: 700ms
#monitor.jvm.gc.young.debug: 400ms
#monitor.jvm.gc.old.warn: 10s
#monitor.jvm.gc.old.info: 5s
#monitor.jvm.gc.old.debug: 2s
################################# AZURE PLUGIN
###############################
cloud:
azure:
keystore: c:/Certs/certificate.pfx
password: password
subscription_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
service_name: apw-es-vms
discovery:
type: azure

Cluster 2 - brand new clean cluster with same base configuration and only
differences in elasticsearch.yml file are subscription id, service name and
cluster name. Once again none of the nodes in this configuration can see
each other.

I suspect this is a configuration issue but my experience with
elasticsearch is limited. Does anyone have any ideas what I could have
configured incorrectly?

Thanks

Andrew

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/12cca085-c7db-440e-8740-0bb973ff68a5%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f322a3b4-733f-4718-a051-0654e4d38ab2%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/37603911-6d7f-4351-b47d-b80f5014a11f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6