Elasticsearch LXC on Ubuntu 14.04 and recomended settings


(engel der) #1

Hi,

we are setting up a Elasticsearch 1.0 (RC2) Cluster and I think I need some
help were to start with (settings related). We have got 6 physical server
with 265GB RAM and 2TB local SAS storage (seperated in two Raid10 Groups as
LVM VGs). Those six servers are running Ubuntu 14.04. All "roles"
(Application Server [NGINX+PHP-FPM+GlusterFS-Client+Elasticsearch
"searcher"], Database Server [Galera Cluster], Storage Server [GlusterFS],
Cache Server [Redis] ...) will be running in LXC containers. Most of them
Ubuntu 14.04 only the Galera Cluster in 12.04.
We expect about 100GB of data to index and the data is changing not that
fast (5% per day?). The idea is to install Elastic Search on all 6
Application Severs as "searcher" with:

cluster.name: search001
node.master: false
node.data: false
#node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

Add 3 "data" Nodes with:

cluster.name: search001
node.master: false
#node.data: false
#node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2
bootstrap.mlockall: true

and 3 "master" nodes:

cluster.name: search001
#node.master: false
node.data: false
node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

The LXCs for those "searchers" get 8GB RAM, the "masters" get 2GB RAM and
the "data" LXCs get 60GB and 300GB storage.

What about the Java settings for those "data" nodes???

cat /etc/default/elasticsearch

Run Elasticsearch as this user ID and group ID

ES_USER=elasticsearch
ES_GROUP=elasticsearch

Heap Size (defaults to 256m min, 1g max)

ES_HEAP_SIZE=30g

Heap new generation

ES_HEAP_NEWSIZE=1g

max direct memory

ES_DIRECT_SIZE=???

Maximum number of open files, defaults to 65535.

MAX_OPEN_FILES=65535

Maximum locked memory size. Set to "unlimited" if you use the

bootstrap.mlockall option in elasticsearch.yml. You must also set

ES_HEAP_SIZE.

MAX_LOCKED_MEMORY=unlimited

Maximum number of VMA (Virtual Memory Areas) a process can own

MAX_MAP_COUNT=262144 #more????

Elasticsearch log directory

#LOG_DIR=/var/log/elasticsearch

Elasticsearch data directory

#DATA_DIR=/var/lib/elasticsearch

Elasticsearch work directory

#WORK_DIR=/tmp/elasticsearch

Elasticsearch configuration directory

#CONF_DIR=/etc/elasticsearch

Elasticsearch configuration file (elasticsearch.yml)

#CONF_FILE=/etc/elasticsearch/elasticsearch.yml

Additional Java OPTS

#ES_JAVA_OPTS=

Configure restart on package upgrade (true, every other setting will lead

to not restarting)
#RESTART_ON_UPGRADE=true

What about the master and searcher settings? I guess I do not have to tune
them?

Thank you for any help!

Regards,
Flo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/15991fc2-749e-4197-b854-88b5f64fc14a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Tony Su) #2

Hi Enger,
Although I don't yet have enough experience building ES clusters to
directly answer your question(s),

Typically when I'm involved in this type of provisioning, I generally start
off with a set of objectives and then design accordingly. I'd be interested
in your objectives list and then match those to your proposed configuration.

Thx,
Tony

On Tuesday, February 4, 2014 9:54:48 AM UTC-8, engel der wrote:

Hi,

we are setting up a Elasticsearch 1.0 (RC2) Cluster and I think I need
some help were to start with (settings related). We have got 6 physical
server with 265GB RAM and 2TB local SAS storage (seperated in two Raid10
Groups as LVM VGs). Those six servers are running Ubuntu 14.04. All "roles"
(Application Server [NGINX+PHP-FPM+GlusterFS-Client+Elasticsearch
"searcher"], Database Server [Galera Cluster], Storage Server [GlusterFS],
Cache Server [Redis] ...) will be running in LXC containers. Most of them
Ubuntu 14.04 only the Galera Cluster in 12.04.
We expect about 100GB of data to index and the data is changing not that
fast (5% per day?). The idea is to install Elastic Search on all 6
Application Severs as "searcher" with:

cluster.name: search001
node.master: false
node.data: false
#node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

Add 3 "data" Nodes with:

cluster.name: search001
node.master: false
#node.data: false
#node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2
bootstrap.mlockall: true

and 3 "master" nodes:

cluster.name: search001
#node.master: false
node.data: false
node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

The LXCs for those "searchers" get 8GB RAM, the "masters" get 2GB RAM and
the "data" LXCs get 60GB and 300GB storage.

What about the Java settings for those "data" nodes???

cat /etc/default/elasticsearch

Run Elasticsearch as this user ID and group ID

ES_USER=elasticsearch
ES_GROUP=elasticsearch

Heap Size (defaults to 256m min, 1g max)

ES_HEAP_SIZE=30g

Heap new generation

ES_HEAP_NEWSIZE=1g

max direct memory

ES_DIRECT_SIZE=???

Maximum number of open files, defaults to 65535.

MAX_OPEN_FILES=65535

Maximum locked memory size. Set to "unlimited" if you use the

bootstrap.mlockall option in elasticsearch.yml. You must also set

ES_HEAP_SIZE.

MAX_LOCKED_MEMORY=unlimited

Maximum number of VMA (Virtual Memory Areas) a process can own

MAX_MAP_COUNT=262144 #more????

Elasticsearch log directory

#LOG_DIR=/var/log/elasticsearch

Elasticsearch data directory

#DATA_DIR=/var/lib/elasticsearch

Elasticsearch work directory

#WORK_DIR=/tmp/elasticsearch

Elasticsearch configuration directory

#CONF_DIR=/etc/elasticsearch

Elasticsearch configuration file (elasticsearch.yml)

#CONF_FILE=/etc/elasticsearch/elasticsearch.yml

Additional Java OPTS

#ES_JAVA_OPTS=

Configure restart on package upgrade (true, every other setting will

lead to not restarting)
#RESTART_ON_UPGRADE=true

What about the master and searcher settings? I guess I do not have to tune
them?

Thank you for any help!

Regards,
Flo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ab42442-3680-4364-851a-c4f3b590f00e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #3

That looks ok, similar to how we do things with virtualised master/data
nodes.
I wouldn't specify your shard/replica count on the node though, do it in
the index as it allows you to change with ease.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 February 2014 05:19, Tony Su tonysu999@gmail.com wrote:

Hi Enger,
Although I don't yet have enough experience building ES clusters to
directly answer your question(s),

Typically when I'm involved in this type of provisioning, I generally
start off with a set of objectives and then design accordingly. I'd be
interested in your objectives list and then match those to your proposed
configuration.

Thx,
Tony

On Tuesday, February 4, 2014 9:54:48 AM UTC-8, engel der wrote:

Hi,

we are setting up a Elasticsearch 1.0 (RC2) Cluster and I think I need
some help were to start with (settings related). We have got 6 physical
server with 265GB RAM and 2TB local SAS storage (seperated in two Raid10
Groups as LVM VGs). Those six servers are running Ubuntu 14.04. All "roles"
(Application Server [NGINX+PHP-FPM+GlusterFS-Client+Elasticsearch
"searcher"], Database Server [Galera Cluster], Storage Server [GlusterFS],
Cache Server [Redis] ...) will be running in LXC containers. Most of them
Ubuntu 14.04 only the Galera Cluster in 12.04.
We expect about 100GB of data to index and the data is changing not that
fast (5% per day?). The idea is to install Elastic Search on all 6
Application Severs as "searcher" with:

cluster.name: search001
node.master: false
node.data: false
#node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

Add 3 "data" Nodes with:

cluster.name: search001
node.master: false
#node.data: false
#node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2
bootstrap.mlockall: true

and 3 "master" nodes:

cluster.name: search001
#node.master: false
node.data: false
node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

The LXCs for those "searchers" get 8GB RAM, the "masters" get 2GB RAM and
the "data" LXCs get 60GB and 300GB storage.

What about the Java settings for those "data" nodes???

cat /etc/default/elasticsearch

Run Elasticsearch as this user ID and group ID

ES_USER=elasticsearch
ES_GROUP=elasticsearch

Heap Size (defaults to 256m min, 1g max)

ES_HEAP_SIZE=30g

Heap new generation

ES_HEAP_NEWSIZE=1g

max direct memory

ES_DIRECT_SIZE=???

Maximum number of open files, defaults to 65535.

MAX_OPEN_FILES=65535

Maximum locked memory size. Set to "unlimited" if you use the

bootstrap.mlockall option in elasticsearch.yml. You must also set

ES_HEAP_SIZE.

MAX_LOCKED_MEMORY=unlimited

Maximum number of VMA (Virtual Memory Areas) a process can own

MAX_MAP_COUNT=262144 #more????

Elasticsearch log directory

#LOG_DIR=/var/log/elasticsearch

Elasticsearch data directory

#DATA_DIR=/var/lib/elasticsearch

Elasticsearch work directory

#WORK_DIR=/tmp/elasticsearch

Elasticsearch configuration directory

#CONF_DIR=/etc/elasticsearch

Elasticsearch configuration file (elasticsearch.yml)

#CONF_FILE=/etc/elasticsearch/elasticsearch.yml

Additional Java OPTS

#ES_JAVA_OPTS=

Configure restart on package upgrade (true, every other setting will

lead to not restarting)
#RESTART_ON_UPGRADE=true

What about the master and searcher settings? I guess I do not have to
tune them?

Thank you for any help!

Regards,
Flo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7ab42442-3680-4364-851a-c4f3b590f00e%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aSY9Lb%2B1pavQB81C0QkxjnTk_e_F_An2Ht%3DP8w%2BFJKYA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(engel der) #4

Hi Mark,

thank you for your answer.

Am Dienstag, 4. Februar 2014 23:08:32 UTC+1 schrieb Mark Walkom:

That looks ok, similar to how we do things with virtualised master/data
nodes.
I wouldn't specify your shard/replica count on the node though, do it in
the index as it allows you to change with ease.

What do you mean by "specify your shard/replica count on the node"? I did
specify them in the config file /etc/elasticsearch/elasticsearch.yml - is
that wrong? where else to specify them?

Regards,
Flo

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 5 February 2014 05:19, Tony Su <tony...@gmail.com <javascript:>> wrote:

Hi Enger,
Although I don't yet have enough experience building ES clusters to
directly answer your question(s),

Typically when I'm involved in this type of provisioning, I generally
start off with a set of objectives and then design accordingly. I'd be
interested in your objectives list and then match those to your proposed
configuration.

Thx,
Tony

On Tuesday, February 4, 2014 9:54:48 AM UTC-8, engel der wrote:

Hi,

we are setting up a Elasticsearch 1.0 (RC2) Cluster and I think I need
some help were to start with (settings related). We have got 6 physical
server with 265GB RAM and 2TB local SAS storage (seperated in two Raid10
Groups as LVM VGs). Those six servers are running Ubuntu 14.04. All "roles"
(Application Server [NGINX+PHP-FPM+GlusterFS-Client+Elasticsearch
"searcher"], Database Server [Galera Cluster], Storage Server [GlusterFS],
Cache Server [Redis] ...) will be running in LXC containers. Most of them
Ubuntu 14.04 only the Galera Cluster in 12.04.
We expect about 100GB of data to index and the data is changing not that
fast (5% per day?). The idea is to install Elastic Search on all 6
Application Severs as "searcher" with:

cluster.name: search001
node.master: false
node.data: false
#node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

Add 3 "data" Nodes with:

cluster.name: search001
node.master: false
#node.data: false
#node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2
bootstrap.mlockall: true

and 3 "master" nodes:

cluster.name: search001
#node.master: false
node.data: false
node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

The LXCs for those "searchers" get 8GB RAM, the "masters" get 2GB RAM
and the "data" LXCs get 60GB and 300GB storage.

What about the Java settings for those "data" nodes???

cat /etc/default/elasticsearch

Run Elasticsearch as this user ID and group ID

ES_USER=elasticsearch
ES_GROUP=elasticsearch

Heap Size (defaults to 256m min, 1g max)

ES_HEAP_SIZE=30g

Heap new generation

ES_HEAP_NEWSIZE=1g

max direct memory

ES_DIRECT_SIZE=???

Maximum number of open files, defaults to 65535.

MAX_OPEN_FILES=65535

Maximum locked memory size. Set to "unlimited" if you use the

bootstrap.mlockall option in elasticsearch.yml. You must also set

ES_HEAP_SIZE.

MAX_LOCKED_MEMORY=unlimited

Maximum number of VMA (Virtual Memory Areas) a process can own

MAX_MAP_COUNT=262144 #more????

Elasticsearch log directory

#LOG_DIR=/var/log/elasticsearch

Elasticsearch data directory

#DATA_DIR=/var/lib/elasticsearch

Elasticsearch work directory

#WORK_DIR=/tmp/elasticsearch

Elasticsearch configuration directory

#CONF_DIR=/etc/elasticsearch

Elasticsearch configuration file (elasticsearch.yml)

#CONF_FILE=/etc/elasticsearch/elasticsearch.yml

Additional Java OPTS

#ES_JAVA_OPTS=

Configure restart on package upgrade (true, every other setting will

lead to not restarting)
#RESTART_ON_UPGRADE=true

What about the master and searcher settings? I guess I do not have to
tune them?

Thank you for any help!

Regards,
Flo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7ab42442-3680-4364-851a-c4f3b590f00e%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a7e97314-f495-4243-a53c-c38e9889ab8c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #5

If you specify the shard/replica settings in the elasticsearch.yml file,
and you later decide to change them, you have to update the file and
restart every node. Which is a hassle.

Instead just specify that when you create the index via a curl, it's a lot
easier.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 February 2014 18:22, engel der florian.engelmann@gmail.com wrote:

Hi Mark,

thank you for your answer.

Am Dienstag, 4. Februar 2014 23:08:32 UTC+1 schrieb Mark Walkom:

That looks ok, similar to how we do things with virtualised master/data
nodes.
I wouldn't specify your shard/replica count on the node though, do it in
the index as it allows you to change with ease.

What do you mean by "specify your shard/replica count on the node"? I did
specify them in the config file /etc/elasticsearch/elasticsearch.yml - is
that wrong? where else to specify them?

Regards,
Flo

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 February 2014 05:19, Tony Su tony...@gmail.com wrote:

Hi Enger,
Although I don't yet have enough experience building ES clusters to
directly answer your question(s),

Typically when I'm involved in this type of provisioning, I generally
start off with a set of objectives and then design accordingly. I'd be
interested in your objectives list and then match those to your proposed
configuration.

Thx,
Tony

On Tuesday, February 4, 2014 9:54:48 AM UTC-8, engel der wrote:

Hi,

we are setting up a Elasticsearch 1.0 (RC2) Cluster and I think I need
some help were to start with (settings related). We have got 6 physical
server with 265GB RAM and 2TB local SAS storage (seperated in two Raid10
Groups as LVM VGs). Those six servers are running Ubuntu 14.04. All "roles"
(Application Server [NGINX+PHP-FPM+GlusterFS-Client+Elasticsearch
"searcher"], Database Server [Galera Cluster], Storage Server [GlusterFS],
Cache Server [Redis] ...) will be running in LXC containers. Most of them
Ubuntu 14.04 only the Galera Cluster in 12.04.
We expect about 100GB of data to index and the data is changing not
that fast (5% per day?). The idea is to install Elastic Search on all 6
Application Severs as "searcher" with:

cluster.name: search001
node.master: false
node.data: false
#node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

Add 3 "data" Nodes with:

cluster.name: search001
node.master: false
#node.data: false
#node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2
bootstrap.mlockall: true

and 3 "master" nodes:

cluster.name: search001
#node.master: false
node.data: false
node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

The LXCs for those "searchers" get 8GB RAM, the "masters" get 2GB RAM
and the "data" LXCs get 60GB and 300GB storage.

What about the Java settings for those "data" nodes???

cat /etc/default/elasticsearch

Run Elasticsearch as this user ID and group ID

ES_USER=elasticsearch
ES_GROUP=elasticsearch

Heap Size (defaults to 256m min, 1g max)

ES_HEAP_SIZE=30g

Heap new generation

ES_HEAP_NEWSIZE=1g

max direct memory

ES_DIRECT_SIZE=???

Maximum number of open files, defaults to 65535.

MAX_OPEN_FILES=65535

Maximum locked memory size. Set to "unlimited" if you use the

bootstrap.mlockall option in elasticsearch.yml. You must also set

ES_HEAP_SIZE.

MAX_LOCKED_MEMORY=unlimited

Maximum number of VMA (Virtual Memory Areas) a process can own

MAX_MAP_COUNT=262144 #more????

Elasticsearch log directory

#LOG_DIR=/var/log/elasticsearch

Elasticsearch data directory

#DATA_DIR=/var/lib/elasticsearch

Elasticsearch work directory

#WORK_DIR=/tmp/elasticsearch

Elasticsearch configuration directory

#CONF_DIR=/etc/elasticsearch

Elasticsearch configuration file (elasticsearch.yml)

#CONF_FILE=/etc/elasticsearch/elasticsearch.yml

Additional Java OPTS

#ES_JAVA_OPTS=

Configure restart on package upgrade (true, every other setting will

lead to not restarting)
#RESTART_ON_UPGRADE=true

What about the master and searcher settings? I guess I do not have to
tune them?

Thank you for any help!

Regards,
Flo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/7ab42442-3680-4364-851a-c4f3b590f00e%
40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a7e97314-f495-4243-a53c-c38e9889ab8c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624abHycKuQkFPe9PczANS%3DoUCj88c%3DzA17SvnE89NN%3D4zg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6