Jdk fails with out of memory error / es critical index counts


(Nish) #1

elasticsearch is set as a single node instance on a 60G RAM and 32*2.6GHz
machine. I am actively indexing historic data with logstash. It worked well
with ~300 million documents (search and indexing were doing ok) , but all
of a sudden es fails to starts and keep itself up. It starts for few
minutes and I can query but fails with out of memory error. I monitor the
memory and atleast 12G of memory is available when it fails. I had set the
es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same error
every time (see dump below)

*My security limits are as under (this is a test/POC server thus "root"
user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ] [Sabretooth]
version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ] [Sabretooth]
initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ] [Sabretooth]
loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ] [Sabretooth]
initialized
[2014-05-04 13:30:15,390][INFO ][node ] [Sabretooth]
starting ...
[2014-05-04 13:30:15,531][INFO ][transport ] [Sabretooth]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ] [Sabretooth]
new_master
[Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-109-136-59][inet[/10.109.136.59:9300]],
reason: zen-disco-join (elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ] [Sabretooth]
elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ] [Sabretooth]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ] [Sabretooth]
recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ] [Sabretooth]
started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x00000007f7c70000, 196608, 0) failed; error='Cannot
allocate memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to continue.

Native memory allocation (malloc) failed to allocate 196608 bytes for

committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max number of
indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how to do that
    without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years of data
    indexable ; Can I combine multiple indexes into one ? Like one month per
    month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f1d8685d-b230-47c8-b52c-71808545059c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nish) #2

So I tried to move some of the indexes to a new node ;
I moved1 index first and when elasticsearch master encountered a data node,
it identified that "hey, I don't have all the indexes that are in the other
node, so lets copy it"
*MASTER: *
[2014-05-05 17:59:43,158][INFO ][gateway.local.state.meta ] [node1] auto
importing dangled indices [logstash-2013.12.31/OPEN] from
[[node2][aV-5pfthS8aEGlHRlAgFwA][ip-10-63-24-14][inet[/10.63.24.14:9300]]{master=false}]

*NODE: *
[2014-05-05 17:59:33,974][INFO ][gateway.local.state.meta ] [node2]
[logstash-2013.12.31] dangling index, exists on local file system, but not
in cluster metadata, scheduling to delete in [2h], auto import to cluster
state [YES]

If I set gateway.local.auto_import_dangled=no ; it says that it will
delete the indexes in 2 hours (default time) which I don't want.

So we are back to square one. I can't seem to achieve the most
"fundamental" distributive nature of elasticsearch if it keeps all indexes
on all nodes and elasticsearch cannot start because now it has all the
indexes on one node and that are too many

On Sunday, May 4, 2014 10:13:09 AM UTC-4, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and 32*2.6GHz
machine. I am actively indexing historic data with logstash. It worked well
with ~300 million documents (search and indexing were doing ok) , but all
of a sudden es fails to starts and keep itself up. It starts for few
minutes and I can query but fails with out of memory error. I monitor the
memory and atleast 12G of memory is available when it fails. I had set the
es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same error
every time (see dump below)

*My security limits are as under (this is a test/POC server thus "root"
user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ] [Sabretooth]
version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ] [Sabretooth]
initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ] [Sabretooth]
loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ] [Sabretooth]
initialized
[2014-05-04 13:30:15,390][INFO ][node ] [Sabretooth]
starting ...
[2014-05-04 13:30:15,531][INFO ][transport ] [Sabretooth]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ] [Sabretooth]
new_master
[Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-109-136-59][inet[/10.109.136.59:9300]],
reason: zen-disco-join (elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ] [Sabretooth]
elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ] [Sabretooth]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ] [Sabretooth]
recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ] [Sabretooth]
started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x00000007f7c70000, 196608, 0) failed; error='Cannot
allocate memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to

continue.

Native memory allocation (malloc) failed to allocate 196608 bytes for

committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max number of
indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how to do
    that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years of data
    indexable ; Can I combine multiple indexes into one ? Like one month per
    month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5171155f-1223-45bc-a8f1-0e7f8d7c3646%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nate Fox) #3

How many indexes do you have? It almost looks like the system itself cant
allocate the ram needed?
You might try jacking up the nofile to something like 999999 as well? I'd
definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data store,
then remove some (I think there's some state files that tell ES/Lucene
which indexes are on disk), so it might recover if its missing some and
sees the others on another node?

As for your other questions, it depends on usage as to how many nodes -
especially search activity while indexing. We have 230 indexes (1740
shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely handle a
lot more than what you're throwing at it. We dont search often nor do we
load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and 32*2.6GHz
machine. I am actively indexing historic data with logstash. It worked well
with ~300 million documents (search and indexing were doing ok) , but all
of a sudden es fails to starts and keep itself up. It starts for few
minutes and I can query but fails with out of memory error. I monitor the
memory and atleast 12G of memory is available when it fails. I had set the
es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same error
every time (see dump below)

*My security limits are as under (this is a test/POC server thus "root"
user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ] [Sabretooth]
version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ] [Sabretooth]
initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ] [Sabretooth]
loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ] [Sabretooth]
initialized
[2014-05-04 13:30:15,390][INFO ][node ] [Sabretooth]
starting ...
[2014-05-04 13:30:15,531][INFO ][transport ] [Sabretooth]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ] [Sabretooth]
new_master
[Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-109-136-59][inet[/10.109.136.59:9300]],
reason: zen-disco-join (elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ] [Sabretooth]
elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ] [Sabretooth]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ] [Sabretooth]
recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ] [Sabretooth]
started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x00000007f7c70000, 196608, 0) failed; error='Cannot
allocate memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to

continue.

Native memory allocation (malloc) failed to allocate 196608 bytes for

committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max number of
indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how to do
    that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years of data
    indexable ; Can I combine multiple indexes into one ? Like one month per
    month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89132eeb-d4e0-45cb-912b-349903e87bff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nish) #4

Currently I have 279 indexes on a single node and elasticsearch starts for
few minutes and dies ; I only have 60G RAM on disk and as far as I know 60%
is the max that one should allocate to elasticsearch ; I tried allocating
38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which indexes are on
disk)
=> Where is this ? How do I fix it so that it doesn't move all
indexes to all nodes ? I want to split the ~280 indexes into two nodes of
140each. So far I am not able to achieve this as the master keeps moving
nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system itself cant
allocate the ram needed?
You might try jacking up the nofile to something like 999999 as well? I'd
definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data store,
then remove some (I think there's some state files that tell ES/Lucene
which indexes are on disk), so it might recover if its missing some and
sees the others on another node?

As for your other questions, it depends on usage as to how many nodes -
especially search activity while indexing. We have 230 indexes (1740
shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely handle a
lot more than what you're throwing at it. We dont search often nor do we
load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and
32*2.6GHz machine. I am actively indexing historic data with logstash. It
worked well with ~300 million documents (search and indexing were doing ok)
, but all of a sudden es fails to starts and keep itself up. It starts for
few minutes and I can query but fails with out of memory error. I monitor
the memory and atleast 12G of memory is available when it fails. I had set
the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
error every time (see dump below)

*My security limits are as under (this is a test/POC server thus "root"
user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ] [Sabretooth]
version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ] [Sabretooth]
initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ] [Sabretooth]
loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ] [Sabretooth]
initialized
[2014-05-04 13:30:15,390][INFO ][node ] [Sabretooth]
starting ...
[2014-05-04 13:30:15,531][INFO ][transport ] [Sabretooth]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ] [Sabretooth]
new_master
[Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-109-136-59][inet[/10.109.136.59:9300]],
reason: zen-disco-join (elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ] [Sabretooth]
elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ] [Sabretooth]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ] [Sabretooth]
recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ] [Sabretooth]
started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x00000007f7c70000, 196608, 0) failed; error='Cannot
allocate memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to

continue.

Native memory allocation (malloc) failed to allocate 196608 bytes for

committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max number of
indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how to do
    that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years of data
    indexable ; Can I combine multiple indexes into one ? Like one month per
    month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nate Fox) #5

You might turn off the bootstrap.mlockall flag just for now - it'll make ES
swap a ton, but your error message looks like an OS level issue. Make sure
you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball somewhere (unless
    you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there that say not
    to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of the indexes
  • Fire up both nodes, make sure they both have the same cluster name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish electronish@gmail.com wrote:

Currently I have 279 indexes on a single node and elasticsearch starts for
few minutes and dies ; I only have 60G RAM on disk and as far as I know 60%
is the max that one should allocate to elasticsearch ; I tried allocating
38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which indexes are
on disk)
=> Where is this ? How do I fix it so that it doesn't move all
indexes to all nodes ? I want to split the ~280 indexes into two nodes of
140each. So far I am not able to achieve this as the master keeps moving
nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system itself cant
allocate the ram needed?
You might try jacking up the nofile to something like 999999 as well? I'd
definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data store,
then remove some (I think there's some state files that tell ES/Lucene
which indexes are on disk), so it might recover if its missing some and
sees the others on another node?

As for your other questions, it depends on usage as to how many nodes -
especially search activity while indexing. We have 230 indexes (1740
shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely handle a
lot more than what you're throwing at it. We dont search often nor do we
load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and
32*2.6GHz machine. I am actively indexing historic data with logstash. It
worked well with ~300 million documents (search and indexing were doing ok)
, but all of a sudden es fails to starts and keep itself up. It starts for
few minutes and I can query but fails with out of memory error. I monitor
the memory and atleast 12G of memory is available when it fails. I had set
the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
error every time (see dump below)

*My security limits are as under (this is a test/POC server thus "root"
user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ] [Sabretooth]
version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ] [Sabretooth]
initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ] [Sabretooth]
loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ] [Sabretooth]
initialized
[2014-05-04 13:30:15,390][INFO ][node ] [Sabretooth]
starting ...
[2014-05-04 13:30:15,531][INFO ][transport ] [Sabretooth]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ] [Sabretooth]
new_master [Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-109-136-59][inet[/
10.109.136.59:9300]], reason: zen-disco-join (elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ] [Sabretooth]
elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ] [Sabretooth]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ] [Sabretooth]
recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ] [Sabretooth]
started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007f7c70000,
196608, 0) failed; error='Cannot allocate memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to

continue.

Native memory allocation (malloc) failed to allocate 196608 bytes for

committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max number of
indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how to do
    that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years of data
    indexable ; Can I combine multiple indexes into one ? Like one month per
    month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHU4sP-EU0Z033x_F_PvQqKgSe5aKWD%3DOkq4kyoQezM-_99Amw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nish) #6

.."- Fire up both nodes, make sure they both have the same cluster name"<= This is exactly what I wrote in my second message is where Elasticsearch
is messing up. When I move the index to a new node and delete that index
from master and then start master node and other data node, it (master)
throws a message:
"auto importing dangled indices"
This means master is now copying the "deleted" index that exists only on
other node to itself !

Basically this is what happens:

  1. Node1 Master: rock,paper,scissors
  2. I move rock from Node 1 to Node 2 (I verify by starting ONLY node1
    and I can see that I am missing data that was originally in "rock" index,
    as expected, all good)
  3. SO node1 now has paper,scissors
  4. I start Node2 with ONLY "rock" index (verify independently, it works)
  5. Then I start node 1 (master) and node 2(data)
  6. Node1 sees says "hey I don't have rock, but node2 has it, let me copy
    it to myself"

On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:

You might turn off the bootstrap.mlockall flag just for now - it'll make
ES swap a ton, but your error message looks like an OS level issue. Make
sure you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball somewhere (unless
    you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there that say not
    to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of the indexes
  • Fire up both nodes, make sure they both have the same cluster name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish <elect...@gmail.com <javascript:>>wrote:

Currently I have 279 indexes on a single node and elasticsearch starts
for few minutes and dies ; I only have 60G RAM on disk and as far as I know
60% is the max that one should allocate to elasticsearch ; I tried
allocating 38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which indexes are
on disk)
=> Where is this ? How do I fix it so that it doesn't move all
indexes to all nodes ? I want to split the ~280 indexes into two nodes of
140each. So far I am not able to achieve this as the master keeps moving
nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system itself
cant allocate the ram needed?
You might try jacking up the nofile to something like 999999 as well?
I'd definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data store,
then remove some (I think there's some state files that tell ES/Lucene
which indexes are on disk), so it might recover if its missing some and
sees the others on another node?

As for your other questions, it depends on usage as to how many nodes -
especially search activity while indexing. We have 230 indexes (1740
shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely handle a
lot more than what you're throwing at it. We dont search often nor do we
load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and
32*2.6GHz machine. I am actively indexing historic data with logstash. It
worked well with ~300 million documents (search and indexing were doing ok)
, but all of a sudden es fails to starts and keep itself up. It starts for
few minutes and I can query but fails with out of memory error. I monitor
the memory and atleast 12G of memory is available when it fails. I had set
the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
error every time (see dump below)

*My security limits are as under (this is a test/POC server thus
"root" user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ]
[Sabretooth] loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] initialized
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] starting ...
[2014-05-04 13:30:15,531][INFO ][transport ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ]
[Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-
109-136-59][inet[/10.109.136.59:9300]], reason: zen-disco-join
(elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ]
[Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ]
[Sabretooth] recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ]
[Sabretooth] started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007f7c70000,
196608, 0) failed; error='Cannot allocate memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to

continue.

Native memory allocation (malloc) failed to allocate 196608 bytes for

committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max number
of indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how to do
    that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years of
    data indexable ; Can I combine multiple indexes into one ? Like one month
    per month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nate Fox) #7

Get node2 running with rock. Then issue a disable_allocation and then bring
up node1.
curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'

From there, adjust the replica settings on the indexes down to 0 so they
dont copy. Once thats set, change disable_allocation to false.

On Mon, May 5, 2014 at 1:19 PM, Nish electronish@gmail.com wrote:

.."- Fire up both nodes, make sure they both have the same cluster name"<= This is exactly what I wrote in my second message is where Elasticsearch
is messing up. When I move the index to a new node and delete that index
from master and then start master node and other data node, it (master)
throws a message:
"auto importing dangled indices"
This means master is now copying the "deleted" index that exists only on
other node to itself !

Basically this is what happens:

  1. Node1 Master: rock,paper,scissors
  2. I move rock from Node 1 to Node 2 (I verify by starting ONLY node1
    and I can see that I am missing data that was originally in "rock" index,
    as expected, all good)
  3. SO node1 now has paper,scissors
  4. I start Node2 with ONLY "rock" index (verify independently, it
    works)
  5. Then I start node 1 (master) and node 2(data)
  6. Node1 sees says "hey I don't have rock, but node2 has it, let me
    copy it to myself"

On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:

You might turn off the bootstrap.mlockall flag just for now - it'll make
ES swap a ton, but your error message looks like an OS level issue. Make
sure you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball somewhere
    (unless you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there that say
    not to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of the indexes
  • Fire up both nodes, make sure they both have the same cluster name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish elect...@gmail.com wrote:

Currently I have 279 indexes on a single node and elasticsearch starts
for few minutes and dies ; I only have 60G RAM on disk and as far as I know
60% is the max that one should allocate to elasticsearch ; I tried
allocating 38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which indexes are
on disk)
=> Where is this ? How do I fix it so that it doesn't move
all indexes to all nodes ? I want to split the ~280 indexes into two nodes
of 140each. So far I am not able to achieve this as the master keeps moving
nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system itself
cant allocate the ram needed?
You might try jacking up the nofile to something like 999999 as well?
I'd definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data store,
then remove some (I think there's some state files that tell ES/Lucene
which indexes are on disk), so it might recover if its missing some and
sees the others on another node?

As for your other questions, it depends on usage as to how many nodes -
especially search activity while indexing. We have 230 indexes (1740
shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely handle a
lot more than what you're throwing at it. We dont search often nor do we
load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and
32*2.6GHz machine. I am actively indexing historic data with logstash. It
worked well with ~300 million documents (search and indexing were doing ok)
, but all of a sudden es fails to starts and keep itself up. It starts for
few minutes and I can query but fails with out of memory error. I monitor
the memory and atleast 12G of memory is available when it fails. I had set
the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
error every time (see dump below)

*My security limits are as under (this is a test/POC server thus
"root" user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ]
[Sabretooth] loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] initialized
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] starting ...
[2014-05-04 13:30:15,531][INFO ][transport ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ]
[Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-
109-136-59][inet[/10.109.136.59:9300]], reason: zen-disco-join
(elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ]
[Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ]
[Sabretooth] recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ]
[Sabretooth] started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard pages
failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007f7c70000,
196608, 0) failed; error='Cannot allocate memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to

continue.

Native memory allocation (malloc) failed to allocate 196608 bytes

for committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max number
of indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how to do
    that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years of
    data indexable ; Can I combine multiple indexes into one ? Like one month
    per month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nish) #8

Thanks Nate, but this doesn't work. node2 is not the master. So starting it
first didn't make sense, anyway I tried it and I couldn't execute anything
on a nonmaster node (node2) unless master was started

I started node2 (non master) and ran this: curl -XPUT
localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
after 30s I got this:
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

I started node1 and as bloody expected elasticsearch copied all the indexes
:frowning: ..
"auto importing dangled indices"

I cannot believe I am unable to get this fundamental elasticsearch feature
working !

On Mon, May 5, 2014 at 4:25 PM, Nate Fox thefox@gmail.com wrote:

Get node2 running with rock. Then issue a disable_allocation and then
bring up node1.
curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'

From there, adjust the replica settings on the indexes down to 0 so they
dont copy. Once thats set, change disable_allocation to false.

On Mon, May 5, 2014 at 1:19 PM, Nish electronish@gmail.com wrote:

.."- Fire up both nodes, make sure they both have the same cluster name"<= This is exactly what I wrote in my second message is where Elasticsearch
is messing up. When I move the index to a new node and delete that index
from master and then start master node and other data node, it (master)
throws a message:
"auto importing dangled indices"
This means master is now copying the "deleted" index that exists only on
other node to itself !

Basically this is what happens:

  1. Node1 Master: rock,paper,scissors
  2. I move rock from Node 1 to Node 2 (I verify by starting ONLY node1
    and I can see that I am missing data that was originally in "rock" index,
    as expected, all good)
  3. SO node1 now has paper,scissors
  4. I start Node2 with ONLY "rock" index (verify independently, it
    works)
  5. Then I start node 1 (master) and node 2(data)
  6. Node1 sees says "hey I don't have rock, but node2 has it, let me
    copy it to myself"

On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:

You might turn off the bootstrap.mlockall flag just for now - it'll make
ES swap a ton, but your error message looks like an OS level issue. Make
sure you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball somewhere
    (unless you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there that say
    not to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of the
    indexes
  • Fire up both nodes, make sure they both have the same cluster name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish elect...@gmail.com wrote:

Currently I have 279 indexes on a single node and elasticsearch
starts for few minutes and dies ; I only have 60G RAM on disk and as far as
I know 60% is the max that one should allocate to elasticsearch ; I tried
allocating 38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which indexes
are on disk)
=> Where is this ? How do I fix it so that it doesn't
move all indexes to all nodes ? I want to split the ~280 indexes into two
nodes of 140each. So far I am not able to achieve this as the master keeps
moving nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system itself
cant allocate the ram needed?
You might try jacking up the nofile to something like 999999 as well?
I'd definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data
store, then remove some (I think there's some state files that tell
ES/Lucene which indexes are on disk), so it might recover if its missing
some and sees the others on another node?

As for your other questions, it depends on usage as to how many nodes

  • especially search activity while indexing. We have 230 indexes (1740
    shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely handle a
    lot more than what you're throwing at it. We dont search often nor do we
    load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and
32*2.6GHz machine. I am actively indexing historic data with logstash. It
worked well with ~300 million documents (search and indexing were doing ok)
, but all of a sudden es fails to starts and keep itself up. It starts for
few minutes and I can query but fails with out of memory error. I monitor
the memory and atleast 12G of memory is available when it fails. I had set
the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
error every time (see dump below)

*My security limits are as under (this is a test/POC server thus
"root" user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ]
[Sabretooth] loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] initialized
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] starting ...
[2014-05-04 13:30:15,531][INFO ][transport ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ]
[Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-
109-136-59][inet[/10.109.136.59:9300]], reason: zen-disco-join
(elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ]
[Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ]
[Sabretooth] recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ]
[Sabretooth] started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(
0x00000007f7c70000, 196608, 0) failed; error='Cannot allocate
memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to

continue.

Native memory allocation (malloc) failed to allocate 196608 bytes

for committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max number
of indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how to
    do that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years of
    data indexable ; Can I combine multiple indexes into one ? Like one month
    per month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #9

Moving data on the OS level without making ES aware can cause difficulties
as you are seeing.

A few suggestions on how to resolve this and improve things in general;

  1. Set your heap size to 31GB.
  2. Use Oracle's java, not OpenJDK.
  3. Set bootstrap.mlockall to true, you don't want to swap, ever.

Given the large number of indexes you have on node1, and to get to a point
where you can move some of these to a new node and stop the root problem,
it's going to be worth closing some of the older indexes. So try these
steps;

  1. Stop node2.
  2. Delete any data from the second node, to prevent things being auto
    imported again.
  3. Start node1, or restart it if it's running.
  4. Close all your indexes older than a month -
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html.
    You can use wildcards in index names to make the update easier. What this
    will do is tell ES to not load the index metadata into memory, which will
    help with your OOM issue.
  5. Start node2 and let it join the cluster.
  6. Make sure the cluster is in a green state. If you're not already, use
    something like ElasticHQ, kopf or Marvel to monitor things.
  7. Let the cluster rebalance the current open indexes.
  8. Once that is ok and things are stable, reopen your closed indexes a
    month at a time, and let them rebalance.

That should get you back up and running. Once you're there we can go back
to your original post :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 11:15, Nishchay Shah electronish@gmail.com wrote:

Thanks Nate, but this doesn't work. node2 is not the master. So starting
it first didn't make sense, anyway I tried it and I couldn't execute
anything on a nonmaster node (node2) unless master was started

I started node2 (non master) and ran this: curl -XPUT
localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
after 30s I got this:
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

I started node1 and as bloody expected elasticsearch copied all the
indexes :frowning: ..
"auto importing dangled indices"

I cannot believe I am unable to get this fundamental elasticsearch feature
working !

On Mon, May 5, 2014 at 4:25 PM, Nate Fox thefox@gmail.com wrote:

Get node2 running with rock. Then issue a disable_allocation and then
bring up node1.
curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'

From there, adjust the replica settings on the indexes down to 0 so they
dont copy. Once thats set, change disable_allocation to false.

On Mon, May 5, 2014 at 1:19 PM, Nish electronish@gmail.com wrote:

.."- Fire up both nodes, make sure they both have the same cluster
name"
<= This is exactly what I wrote in my second message is where
Elasticsearch is messing up. When I move the index to a new node and delete
that index from master and then start master node and other data node, it
(master) throws a message:
"auto importing dangled indices"
This means master is now copying the "deleted" index that exists only on
other node to itself !

Basically this is what happens:

  1. Node1 Master: rock,paper,scissors
  2. I move rock from Node 1 to Node 2 (I verify by starting ONLY
    node1 and I can see that I am missing data that was originally in "rock"
    index, as expected, all good)
  3. SO node1 now has paper,scissors
  4. I start Node2 with ONLY "rock" index (verify independently, it
    works)
  5. Then I start node 1 (master) and node 2(data)
  6. Node1 sees says "hey I don't have rock, but node2 has it, let me
    copy it to myself"

On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:

You might turn off the bootstrap.mlockall flag just for now - it'll
make ES swap a ton, but your error message looks like an OS level issue.
Make sure you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball somewhere
    (unless you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there that say
    not to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of the
    indexes
  • Fire up both nodes, make sure they both have the same cluster name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish elect...@gmail.com wrote:

Currently I have 279 indexes on a single node and elasticsearch
starts for few minutes and dies ; I only have 60G RAM on disk and as far as
I know 60% is the max that one should allocate to elasticsearch ; I tried
allocating 38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which indexes
are on disk)
=> Where is this ? How do I fix it so that it doesn't
move all indexes to all nodes ? I want to split the ~280 indexes into two
nodes of 140each. So far I am not able to achieve this as the master keeps
moving nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system itself
cant allocate the ram needed?
You might try jacking up the nofile to something like 999999 as well?
I'd definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data
store, then remove some (I think there's some state files that tell
ES/Lucene which indexes are on disk), so it might recover if its missing
some and sees the others on another node?

As for your other questions, it depends on usage as to how many nodes

  • especially search activity while indexing. We have 230 indexes (1740
    shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely handle a
    lot more than what you're throwing at it. We dont search often nor do we
    load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and
32*2.6GHz machine. I am actively indexing historic data with logstash. It
worked well with ~300 million documents (search and indexing were doing ok)
, but all of a sudden es fails to starts and keep itself up. It starts for
few minutes and I can query but fails with out of memory error. I monitor
the memory and atleast 12G of memory is available when it fails. I had set
the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
error every time (see dump below)

*My security limits are as under (this is a test/POC server thus
"root" user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ]
[Sabretooth] loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] initialized
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] starting ...
[2014-05-04 13:30:15,531][INFO ][transport ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ]
[Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-
109-136-59][inet[/10.109.136.59:9300]], reason: zen-disco-join
(elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ]
[Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ]
[Sabretooth] recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ]
[Sabretooth] started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(
0x00000007f7c70000, 196608, 0) failed; error='Cannot allocate
memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to

continue.

Native memory allocation (malloc) failed to allocate 196608 bytes

for committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max
number of indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how to
    do that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years of
    data indexable ; Can I combine multiple indexes into one ? Like one month
    per month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nish) #10

Hey Mark,
Thanks for the response. I have currently created two new medium test
instances (1 master 1 data only) because I didn't want to mess with the
main dataset. In my test setup, I have about 600MB of data ; 7 indexes

After looking around a lot I saw that the directory organization is
/elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes// and the master node has only 1 directory

(master)

ls /elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes

0

So on node2 I created a "1" directory and moved 1 index from master to data
; So master now has six indexes in 0 and data has one in 1.
When I started elasticsearch after that I got to a point where the master
is not NOT copying the data back to itself.. but now node2 is copying
master's data and making a "0" directory ; Also, I am unable to query the
node2's data !

On Mon, May 5, 2014 at 9:34 PM, Mark Walkom markw@campaignmonitor.comwrote:

Moving data on the OS level without making ES aware can cause difficulties
as you are seeing.

A few suggestions on how to resolve this and improve things in general;

  1. Set your heap size to 31GB.
  2. Use Oracle's java, not OpenJDK.
  3. Set bootstrap.mlockall to true, you don't want to swap, ever.

Given the large number of indexes you have on node1, and to get to a point
where you can move some of these to a new node and stop the root problem,
it's going to be worth closing some of the older indexes. So try these
steps;

  1. Stop node2.
  2. Delete any data from the second node, to prevent things being auto
    imported again.
  3. Start node1, or restart it if it's running.
  4. Close all your indexes older than a month -
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html.
    You can use wildcards in index names to make the update easier. What this
    will do is tell ES to not load the index metadata into memory, which will
    help with your OOM issue.
  5. Start node2 and let it join the cluster.
  6. Make sure the cluster is in a green state. If you're not already,
    use something like ElasticHQ, kopf or Marvel to monitor things.
  7. Let the cluster rebalance the current open indexes.
  8. Once that is ok and things are stable, reopen your closed indexes a
    month at a time, and let them rebalance.

That should get you back up and running. Once you're there we can go back
to your original post :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 11:15, Nishchay Shah electronish@gmail.com wrote:

Thanks Nate, but this doesn't work. node2 is not the master. So starting
it first didn't make sense, anyway I tried it and I couldn't execute
anything on a nonmaster node (node2) unless master was started

I started node2 (non master) and ran this: curl -XPUT
localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
after 30s I got this:
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

I started node1 and as bloody expected elasticsearch copied all the
indexes :frowning: ..
"auto importing dangled indices"

I cannot believe I am unable to get this fundamental elasticsearch
feature working !

On Mon, May 5, 2014 at 4:25 PM, Nate Fox thefox@gmail.com wrote:

Get node2 running with rock. Then issue a disable_allocation and then
bring up node1.
curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'

From there, adjust the replica settings on the indexes down to 0 so they
dont copy. Once thats set, change disable_allocation to false.

On Mon, May 5, 2014 at 1:19 PM, Nish electronish@gmail.com wrote:

.."- Fire up both nodes, make sure they both have the same cluster
name"
<= This is exactly what I wrote in my second message is where
Elasticsearch is messing up. When I move the index to a new node and delete
that index from master and then start master node and other data node, it
(master) throws a message:
"auto importing dangled indices"
This means master is now copying the "deleted" index that exists only
on other node to itself !

Basically this is what happens:

  1. Node1 Master: rock,paper,scissors
  2. I move rock from Node 1 to Node 2 (I verify by starting ONLY
    node1 and I can see that I am missing data that was originally in "rock"
    index, as expected, all good)
  3. SO node1 now has paper,scissors
  4. I start Node2 with ONLY "rock" index (verify independently, it
    works)
  5. Then I start node 1 (master) and node 2(data)
  6. Node1 sees says "hey I don't have rock, but node2 has it, let me
    copy it to myself"

On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:

You might turn off the bootstrap.mlockall flag just for now - it'll
make ES swap a ton, but your error message looks like an OS level issue.
Make sure you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball somewhere
    (unless you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there that say
    not to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of the
    indexes
  • Fire up both nodes, make sure they both have the same cluster name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish elect...@gmail.com wrote:

Currently I have 279 indexes on a single node and elasticsearch
starts for few minutes and dies ; I only have 60G RAM on disk and as far as
I know 60% is the max that one should allocate to elasticsearch ; I tried
allocating 38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which indexes
are on disk)
=> Where is this ? How do I fix it so that it doesn't
move all indexes to all nodes ? I want to split the ~280 indexes into two
nodes of 140each. So far I am not able to achieve this as the master keeps
moving nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system itself
cant allocate the ram needed?
You might try jacking up the nofile to something like 999999 as
well? I'd definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data
store, then remove some (I think there's some state files that tell
ES/Lucene which indexes are on disk), so it might recover if its missing
some and sees the others on another node?

As for your other questions, it depends on usage as to how many
nodes - especially search activity while indexing. We have 230 indexes
(1740 shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely
handle a lot more than what you're throwing at it. We dont search often nor
do we load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and
32*2.6GHz machine. I am actively indexing historic data with logstash. It
worked well with ~300 million documents (search and indexing were doing ok)
, but all of a sudden es fails to starts and keep itself up. It starts for
few minutes and I can query but fails with out of memory error. I monitor
the memory and atleast 12G of memory is available when it fails. I had set
the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
error every time (see dump below)

*My security limits are as under (this is a test/POC server thus
"root" user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ]
[Sabretooth] loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] initialized
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] starting ...
[2014-05-04 13:30:15,531][INFO ][transport ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ]
[Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-
109-136-59][inet[/10.109.136.59:9300]], reason: zen-disco-join
(elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ]
[Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ]
[Sabretooth] recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ]
[Sabretooth] started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(
0x00000007f7c70000, 196608, 0) failed; error='Cannot allocate
memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to

continue.

Native memory allocation (malloc) failed to allocate 196608 bytes

for committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max
number of indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how to
    do that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years of
    data indexable ; Can I combine multiple indexes into one ? Like one month
    per month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #11

Don't copy indexes on the OS level!

Is your new cluster balancing the shards?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 12:46, Nishchay Shah electronish@gmail.com wrote:

Hey Mark,
Thanks for the response. I have currently created two new medium test
instances (1 master 1 data only) because I didn't want to mess with the
main dataset. In my test setup, I have about 600MB of data ; 7 indexes

After looking around a lot I saw that the directory organization is
/elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes// and the master node has only 1 directory

(master)

ls /elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes

0

So on node2 I created a "1" directory and moved 1 index from master to
data ; So master now has six indexes in 0 and data has one in 1.
When I started elasticsearch after that I got to a point where the master
is not NOT copying the data back to itself.. but now node2 is copying
master's data and making a "0" directory ; Also, I am unable to query the
node2's data !

On Mon, May 5, 2014 at 9:34 PM, Mark Walkom markw@campaignmonitor.comwrote:

Moving data on the OS level without making ES aware can cause
difficulties as you are seeing.

A few suggestions on how to resolve this and improve things in general;

  1. Set your heap size to 31GB.
  2. Use Oracle's java, not OpenJDK.
  3. Set bootstrap.mlockall to true, you don't want to swap, ever.

Given the large number of indexes you have on node1, and to get to a
point where you can move some of these to a new node and stop the root
problem, it's going to be worth closing some of the older indexes. So try
these steps;

  1. Stop node2.
  2. Delete any data from the second node, to prevent things being auto
    imported again.
  3. Start node1, or restart it if it's running.
  4. Close all your indexes older than a month -
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html.
    You can use wildcards in index names to make the update easier. What this
    will do is tell ES to not load the index metadata into memory, which will
    help with your OOM issue.
  5. Start node2 and let it join the cluster.
  6. Make sure the cluster is in a green state. If you're not already,
    use something like ElasticHQ, kopf or Marvel to monitor things.
  7. Let the cluster rebalance the current open indexes.
  8. Once that is ok and things are stable, reopen your closed indexes
    a month at a time, and let them rebalance.

That should get you back up and running. Once you're there we can go back
to your original post :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 11:15, Nishchay Shah electronish@gmail.com wrote:

Thanks Nate, but this doesn't work. node2 is not the master. So starting
it first didn't make sense, anyway I tried it and I couldn't execute
anything on a nonmaster node (node2) unless master was started

I started node2 (non master) and ran this: curl -XPUT
localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
after 30s I got this:
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

I started node1 and as bloody expected elasticsearch copied all the
indexes :frowning: ..
"auto importing dangled indices"

I cannot believe I am unable to get this fundamental elasticsearch
feature working !

On Mon, May 5, 2014 at 4:25 PM, Nate Fox thefox@gmail.com wrote:

Get node2 running with rock. Then issue a disable_allocation and then
bring up node1.
curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'

From there, adjust the replica settings on the indexes down to 0 so
they dont copy. Once thats set, change disable_allocation to false.

On Mon, May 5, 2014 at 1:19 PM, Nish electronish@gmail.com wrote:

.."- Fire up both nodes, make sure they both have the same cluster
name"
<= This is exactly what I wrote in my second message is where
Elasticsearch is messing up. When I move the index to a new node and delete
that index from master and then start master node and other data node, it
(master) throws a message:
"auto importing dangled indices"
This means master is now copying the "deleted" index that exists only
on other node to itself !

Basically this is what happens:

  1. Node1 Master: rock,paper,scissors
  2. I move rock from Node 1 to Node 2 (I verify by starting ONLY
    node1 and I can see that I am missing data that was originally in "rock"
    index, as expected, all good)
  3. SO node1 now has paper,scissors
  4. I start Node2 with ONLY "rock" index (verify independently, it
    works)
  5. Then I start node 1 (master) and node 2(data)
  6. Node1 sees says "hey I don't have rock, but node2 has it, let
    me copy it to myself"

On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:

You might turn off the bootstrap.mlockall flag just for now - it'll
make ES swap a ton, but your error message looks like an OS level issue.
Make sure you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball somewhere
    (unless you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there that
    say not to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of the
    indexes
  • Fire up both nodes, make sure they both have the same cluster name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish elect...@gmail.com wrote:

Currently I have 279 indexes on a single node and elasticsearch
starts for few minutes and dies ; I only have 60G RAM on disk and as far as
I know 60% is the max that one should allocate to elasticsearch ; I tried
allocating 38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which indexes
are on disk)
=> Where is this ? How do I fix it so that it doesn't
move all indexes to all nodes ? I want to split the ~280 indexes into two
nodes of 140each. So far I am not able to achieve this as the master keeps
moving nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system
itself cant allocate the ram needed?
You might try jacking up the nofile to something like 999999 as
well? I'd definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data
store, then remove some (I think there's some state files that tell
ES/Lucene which indexes are on disk), so it might recover if its missing
some and sees the others on another node?

As for your other questions, it depends on usage as to how many
nodes - especially search activity while indexing. We have 230 indexes
(1740 shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely
handle a lot more than what you're throwing at it. We dont search often nor
do we load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and
32*2.6GHz machine. I am actively indexing historic data with logstash. It
worked well with ~300 million documents (search and indexing were doing ok)
, but all of a sudden es fails to starts and keep itself up. It starts for
few minutes and I can query but fails with out of memory error. I monitor
the memory and atleast 12G of memory is available when it fails. I had set
the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
error every time (see dump below)

*My security limits are as under (this is a test/POC server thus
"root" user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ]
[Sabretooth] loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] initialized
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] starting ...
[2014-05-04 13:30:15,531][INFO ][transport ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ]
[Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94
A2vHw][ip-10-109-136-59][inet[/10.109.136.59:9300]], reason:
zen-disco-join (elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ]
[Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ]
[Sabretooth] recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ]
[Sabretooth] started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack
guard pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(
0x00000007f7c70000, 196608, 0) failed; error='Cannot allocate
memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to

continue.

Native memory allocation (malloc) failed to allocate 196608

bytes for committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max
number of indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how
    to do that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years
    of data indexable ; Can I combine multiple indexes into one ? Like one
    month per month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/564e2951-
ed54-4f34-97a9-4de88f187a7a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZWXuxDoOy6EVxbsqmXvdzRLkr4Waq4DB62vjyd6na4Ow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nish) #12

Probably not.

I deleted all data from slave and restarted both servers and I see this:

*Master: *
[root@ip-10-169-36-251 logstash-2013.12.22]# du -h --max-depth=1
16M ./0
16M ./1
8.0K ./_state
15M ./4
15M ./3
15M ./2
75M .

*Data: *

[root@ip-10-186-152-19 logstash-2013.12.22]# du -h --max-depth=1
16M ./0
16M ./1
15M ./4
15M ./3
15M ./2
75M .

On Mon, May 5, 2014 at 10:53 PM, Mark Walkom markw@campaignmonitor.comwrote:

Don't copy indexes on the OS level!

Is your new cluster balancing the shards?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 12:46, Nishchay Shah electronish@gmail.com wrote:

Hey Mark,
Thanks for the response. I have currently created two new medium test
instances (1 master 1 data only) because I didn't want to mess with the
main dataset. In my test setup, I have about 600MB of data ; 7 indexes

After looking around a lot I saw that the directory organization is
/elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes// and the master node has only 1 directory

(master)

ls /elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes

0

So on node2 I created a "1" directory and moved 1 index from master to
data ; So master now has six indexes in 0 and data has one in 1.
When I started elasticsearch after that I got to a point where the master
is not NOT copying the data back to itself.. but now node2 is copying
master's data and making a "0" directory ; Also, I am unable to query the
node2's data !

On Mon, May 5, 2014 at 9:34 PM, Mark Walkom markw@campaignmonitor.comwrote:

Moving data on the OS level without making ES aware can cause
difficulties as you are seeing.

A few suggestions on how to resolve this and improve things in general;

  1. Set your heap size to 31GB.
  2. Use Oracle's java, not OpenJDK.
  3. Set bootstrap.mlockall to true, you don't want to swap, ever.

Given the large number of indexes you have on node1, and to get to a
point where you can move some of these to a new node and stop the root
problem, it's going to be worth closing some of the older indexes. So try
these steps;

  1. Stop node2.
  2. Delete any data from the second node, to prevent things being
    auto imported again.
  3. Start node1, or restart it if it's running.
  4. Close all your indexes older than a month -
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html.
    You can use wildcards in index names to make the update easier. What this
    will do is tell ES to not load the index metadata into memory, which will
    help with your OOM issue.
  5. Start node2 and let it join the cluster.
  6. Make sure the cluster is in a green state. If you're not already,
    use something like ElasticHQ, kopf or Marvel to monitor things.
  7. Let the cluster rebalance the current open indexes.
  8. Once that is ok and things are stable, reopen your closed indexes
    a month at a time, and let them rebalance.

That should get you back up and running. Once you're there we can go
back to your original post :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 11:15, Nishchay Shah electronish@gmail.com wrote:

Thanks Nate, but this doesn't work. node2 is not the master. So
starting it first didn't make sense, anyway I tried it and I couldn't
execute anything on a nonmaster node (node2) unless master was started

I started node2 (non master) and ran this: curl -XPUT
localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
after 30s I got this:
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

I started node1 and as bloody expected elasticsearch copied all the
indexes :frowning: ..
"auto importing dangled indices"

I cannot believe I am unable to get this fundamental elasticsearch
feature working !

On Mon, May 5, 2014 at 4:25 PM, Nate Fox thefox@gmail.com wrote:

Get node2 running with rock. Then issue a disable_allocation and then
bring up node1.
curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'

From there, adjust the replica settings on the indexes down to 0 so
they dont copy. Once thats set, change disable_allocation to false.

On Mon, May 5, 2014 at 1:19 PM, Nish electronish@gmail.com wrote:

.."- Fire up both nodes, make sure they both have the same cluster
name"
<= This is exactly what I wrote in my second message is where
Elasticsearch is messing up. When I move the index to a new node and delete
that index from master and then start master node and other data node, it
(master) throws a message:
"auto importing dangled indices"
This means master is now copying the "deleted" index that exists only
on other node to itself !

Basically this is what happens:

  1. Node1 Master: rock,paper,scissors
  2. I move rock from Node 1 to Node 2 (I verify by starting ONLY
    node1 and I can see that I am missing data that was originally in "rock"
    index, as expected, all good)
  3. SO node1 now has paper,scissors
  4. I start Node2 with ONLY "rock" index (verify independently, it
    works)
  5. Then I start node 1 (master) and node 2(data)
  6. Node1 sees says "hey I don't have rock, but node2 has it, let
    me copy it to myself"

On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:

You might turn off the bootstrap.mlockall flag just for now - it'll
make ES swap a ton, but your error message looks like an OS level issue.
Make sure you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball somewhere
    (unless you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there that
    say not to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of the
    indexes
  • Fire up both nodes, make sure they both have the same cluster name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish elect...@gmail.com wrote:

Currently I have 279 indexes on a single node and elasticsearch
starts for few minutes and dies ; I only have 60G RAM on disk and as far as
I know 60% is the max that one should allocate to elasticsearch ; I tried
allocating 38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which
indexes are on disk)
=> Where is this ? How do I fix it so that
it doesn't move all indexes to all nodes ? I want to split the ~280 indexes
into two nodes of 140each. So far I am not able to achieve this as the
master keeps moving nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system
itself cant allocate the ram needed?
You might try jacking up the nofile to something like 999999 as
well? I'd definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data
store, then remove some (I think there's some state files that tell
ES/Lucene which indexes are on disk), so it might recover if its missing
some and sees the others on another node?

As for your other questions, it depends on usage as to how many
nodes - especially search activity while indexing. We have 230 indexes
(1740 shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely
handle a lot more than what you're throwing at it. We dont search often nor
do we load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and
32*2.6GHz machine. I am actively indexing historic data with logstash. It
worked well with ~300 million documents (search and indexing were doing ok)
, but all of a sudden es fails to starts and keep itself up. It starts for
few minutes and I can query but fails with out of memory error. I monitor
the memory and atleast 12G of memory is available when it fails. I had set
the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
error every time (see dump below)

*My security limits are as under (this is a test/POC server thus
"root" user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ]
[Sabretooth] loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] initialized
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] starting ...
[2014-05-04 13:30:15,531][INFO ][transport ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ]
[Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94
A2vHw][ip-10-109-136-59][inet[/10.109.136.59:9300]], reason:
zen-disco-join (elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ]
[Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ]
[Sabretooth] recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ]
[Sabretooth] started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack
guard pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(
0x00000007f7c70000, 196608, 0) failed; error='Cannot allocate
memory' (errno=12)

There is insufficient memory for the Java Runtime Environment

to continue.

Native memory allocation (malloc) failed to allocate 196608

bytes for committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max
number of indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how
    to do that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years
    of data indexable ; Can I combine multiple indexes into one ? Like one
    month per month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/564e2951-
ed54-4f34-97a9-4de88f187a7a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZWXuxDoOy6EVxbsqmXvdzRLkr4Waq4DB62vjyd6na4Ow%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624ZWXuxDoOy6EVxbsqmXvdzRLkr4Waq4DB62vjyd6na4Ow%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANma5K4SM-LxJwy8fd1oHAh%3DSP84KcBVYEFinJG6_B7EWcCFqQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nish) #13

FYI settings:
Master:
[root@ip-10-169-36-251 logstash-2013.12.05]# grep -vE "^$|^#"
/xx/elasticsearch-1.1.1/config/elasticsearch.yml
cluster.name: elasticsearchtest
node.name: "node1"
node.master: true
node.data: true
index.number_of_replicas: 0
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.169.36.251", "10.186.152.19"]
Non Master
[root@ip-10-186-152-19 logstash-2013.12.05]# grep -vE "^$|^#"
/elasticsearch/es/elasticsearch-1.1.1/config/elasticsearch.yml
cluster.name: elasticsearchtest
node.name: "node2"
node.master: false
node.data: true
index.number_of_replicas: 0
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.169.36.251","10.186.152.19"]

On Mon, May 5, 2014 at 11:01 PM, Nishchay Shah electronish@gmail.comwrote:

Probably not.

I deleted all data from slave and restarted both servers and I see this:

*Master: *
[root@ip-10-169-36-251 logstash-2013.12.22]# du -h --max-depth=1
16M ./0
16M ./1
8.0K ./_state
15M ./4
15M ./3
15M ./2
75M .

*Data: *

[root@ip-10-186-152-19 logstash-2013.12.22]# du -h --max-depth=1
16M ./0
16M ./1
15M ./4
15M ./3
15M ./2
75M .

On Mon, May 5, 2014 at 10:53 PM, Mark Walkom markw@campaignmonitor.comwrote:

Don't copy indexes on the OS level!

Is your new cluster balancing the shards?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 12:46, Nishchay Shah electronish@gmail.com wrote:

Hey Mark,
Thanks for the response. I have currently created two new medium test
instances (1 master 1 data only) because I didn't want to mess with the
main dataset. In my test setup, I have about 600MB of data ; 7 indexes

After looking around a lot I saw that the directory organization is
/elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes// and the master node has only 1 directory

(master)

ls /elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes

0

So on node2 I created a "1" directory and moved 1 index from master to
data ; So master now has six indexes in 0 and data has one in 1.
When I started elasticsearch after that I got to a point where the
master is not NOT copying the data back to itself.. but now node2 is
copying master's data and making a "0" directory ; Also, I am unable to
query the node2's data !

On Mon, May 5, 2014 at 9:34 PM, Mark Walkom markw@campaignmonitor.comwrote:

Moving data on the OS level without making ES aware can cause
difficulties as you are seeing.

A few suggestions on how to resolve this and improve things in
general;

  1. Set your heap size to 31GB.
  2. Use Oracle's java, not OpenJDK.
  3. Set bootstrap.mlockall to true, you don't want to swap, ever.

Given the large number of indexes you have on node1, and to get to a
point where you can move some of these to a new node and stop the root
problem, it's going to be worth closing some of the older indexes. So try
these steps;

  1. Stop node2.
  2. Delete any data from the second node, to prevent things being
    auto imported again.
  3. Start node1, or restart it if it's running.
  4. Close all your indexes older than a month -
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html.
    You can use wildcards in index names to make the update easier. What this
    will do is tell ES to not load the index metadata into memory, which will
    help with your OOM issue.
  5. Start node2 and let it join the cluster.
  6. Make sure the cluster is in a green state. If you're not
    already, use something like ElasticHQ, kopf or Marvel to monitor things.
  7. Let the cluster rebalance the current open indexes.
  8. Once that is ok and things are stable, reopen your closed
    indexes a month at a time, and let them rebalance.

That should get you back up and running. Once you're there we can go
back to your original post :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 11:15, Nishchay Shah electronish@gmail.com wrote:

Thanks Nate, but this doesn't work. node2 is not the master. So
starting it first didn't make sense, anyway I tried it and I couldn't
execute anything on a nonmaster node (node2) unless master was started

I started node2 (non master) and ran this: curl -XPUT
localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
after 30s I got this:
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

I started node1 and as bloody expected elasticsearch copied all the
indexes :frowning: ..
"auto importing dangled indices"

I cannot believe I am unable to get this fundamental elasticsearch
feature working !

On Mon, May 5, 2014 at 4:25 PM, Nate Fox thefox@gmail.com wrote:

Get node2 running with rock. Then issue a disable_allocation and then
bring up node1.
curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'

From there, adjust the replica settings on the indexes down to 0 so
they dont copy. Once thats set, change disable_allocation to false.

On Mon, May 5, 2014 at 1:19 PM, Nish electronish@gmail.com wrote:

.."- Fire up both nodes, make sure they both have the same cluster
name"
<= This is exactly what I wrote in my second message is
where Elasticsearch is messing up. When I move the index to a new node and
delete that index from master and then start master node and other data
node, it (master) throws a message:
"auto importing dangled indices"
This means master is now copying the "deleted" index that exists
only on other node to itself !

Basically this is what happens:

  1. Node1 Master: rock,paper,scissors
  2. I move rock from Node 1 to Node 2 (I verify by starting ONLY
    node1 and I can see that I am missing data that was originally in "rock"
    index, as expected, all good)
  3. SO node1 now has paper,scissors
  4. I start Node2 with ONLY "rock" index (verify independently,
    it works)
  5. Then I start node 1 (master) and node 2(data)
  6. Node1 sees says "hey I don't have rock, but node2 has it, let
    me copy it to myself"

On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:

You might turn off the bootstrap.mlockall flag just for now - it'll
make ES swap a ton, but your error message looks like an OS level issue.
Make sure you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball somewhere
    (unless you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there that
    say not to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of the
    indexes
  • Fire up both nodes, make sure they both have the same cluster name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish elect...@gmail.com wrote:

Currently I have 279 indexes on a single node and elasticsearch
starts for few minutes and dies ; I only have 60G RAM on disk and as far as
I know 60% is the max that one should allocate to elasticsearch ; I tried
allocating 38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which
indexes are on disk)
=> Where is this ? How do I fix it so that
it doesn't move all indexes to all nodes ? I want to split the ~280 indexes
into two nodes of 140each. So far I am not able to achieve this as the
master keeps moving nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system
itself cant allocate the ram needed?
You might try jacking up the nofile to something like 999999 as
well? I'd definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data
store, then remove some (I think there's some state files that tell
ES/Lucene which indexes are on disk), so it might recover if its missing
some and sees the others on another node?

As for your other questions, it depends on usage as to how many
nodes - especially search activity while indexing. We have 230 indexes
(1740 shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely
handle a lot more than what you're throwing at it. We dont search often nor
do we load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM and
32*2.6GHz machine. I am actively indexing historic data with logstash. It
worked well with ~300 million documents (search and indexing were doing ok)
, but all of a sudden es fails to starts and keep itself up. It starts for
few minutes and I can query but fails with out of memory error. I monitor
the memory and atleast 12G of memory is available when it fails. I had set
the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
error every time (see dump below)

*My security limits are as under (this is a test/POC server
thus "root" user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ]
[Sabretooth] loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] initialized
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] starting ...
[2014-05-04 13:30:15,531][INFO ][transport ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ]
[Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94
A2vHw][ip-10-109-136-59][inet[/10.109.136.59:9300]], reason:
zen-disco-join (elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ]
[Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ]
[Sabretooth] recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ]
[Sabretooth] started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard
pages failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack
guard pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(
0x00000007f7c70000, 196608, 0) failed; error='Cannot allocate
memory' (errno=12)

There is insufficient memory for the Java Runtime Environment

to continue.

Native memory allocation (malloc) failed to allocate 196608

bytes for committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max
number of indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes, how
    to do that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2 years
    of data indexable ; Can I combine multiple indexes into one ? Like one
    month per month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/564e2951-
ed54-4f34-97a9-4de88f187a7a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZWXuxDoOy6EVxbsqmXvdzRLkr4Waq4DB62vjyd6na4Ow%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624ZWXuxDoOy6EVxbsqmXvdzRLkr4Waq4DB62vjyd6na4Ow%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANma5K4MP3QwGAgv%2BDpWLuYceMpQUamQs3boaxFGR6xXwhE9Jg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #14

You need to install a monitoring plugin to gain better insight into what is
happening, it makes things a lot easier to visually see cluster/node/index
state and remove your shell commands, which may not be 100% representative
of what ES is actually doing.

I suggest elastichq and marvel.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 13:06, Nishchay Shah electronish@gmail.com wrote:

FYI settings:
Master:
[root@ip-10-169-36-251 logstash-2013.12.05]# grep -vE "^$|^#"
/xx/elasticsearch-1.1.1/config/elasticsearch.yml
cluster.name: elasticsearchtest
node.name: "node1"
node.master: true
node.data: true
index.number_of_replicas: 0
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.169.36.251", "10.186.152.19"]
Non Master
[root@ip-10-186-152-19 logstash-2013.12.05]# grep -vE "^$|^#"
/elasticsearch/es/elasticsearch-1.1.1/config/elasticsearch.yml
cluster.name: elasticsearchtest
node.name: "node2"
node.master: false
node.data: true
index.number_of_replicas: 0
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.169.36.251","10.186.152.19"]

On Mon, May 5, 2014 at 11:01 PM, Nishchay Shah electronish@gmail.comwrote:

Probably not.

I deleted all data from slave and restarted both servers and I see this:

*Master: *
[root@ip-10-169-36-251 logstash-2013.12.22]# du -h --max-depth=1
16M ./0
16M ./1
8.0K ./_state
15M ./4
15M ./3
15M ./2
75M .

*Data: *

[root@ip-10-186-152-19 logstash-2013.12.22]# du -h --max-depth=1
16M ./0
16M ./1
15M ./4
15M ./3
15M ./2
75M .

On Mon, May 5, 2014 at 10:53 PM, Mark Walkom markw@campaignmonitor.comwrote:

Don't copy indexes on the OS level!

Is your new cluster balancing the shards?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 12:46, Nishchay Shah electronish@gmail.com wrote:

Hey Mark,
Thanks for the response. I have currently created two new medium test
instances (1 master 1 data only) because I didn't want to mess with the
main dataset. In my test setup, I have about 600MB of data ; 7 indexes

After looking around a lot I saw that the directory organization is
/elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes// and the master node has only 1 directory

(master)

ls /elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes

0

So on node2 I created a "1" directory and moved 1 index from master to
data ; So master now has six indexes in 0 and data has one in 1.
When I started elasticsearch after that I got to a point where the
master is not NOT copying the data back to itself.. but now node2 is
copying master's data and making a "0" directory ; Also, I am unable to
query the node2's data !

On Mon, May 5, 2014 at 9:34 PM, Mark Walkom markw@campaignmonitor.comwrote:

Moving data on the OS level without making ES aware can cause
difficulties as you are seeing.

A few suggestions on how to resolve this and improve things in
general;

  1. Set your heap size to 31GB.
  2. Use Oracle's java, not OpenJDK.
  3. Set bootstrap.mlockall to true, you don't want to swap, ever.

Given the large number of indexes you have on node1, and to get to a
point where you can move some of these to a new node and stop the root
problem, it's going to be worth closing some of the older indexes. So try
these steps;

  1. Stop node2.
  2. Delete any data from the second node, to prevent things being
    auto imported again.
  3. Start node1, or restart it if it's running.
  4. Close all your indexes older than a month -
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html.
    You can use wildcards in index names to make the update easier. What this
    will do is tell ES to not load the index metadata into memory, which will
    help with your OOM issue.
  5. Start node2 and let it join the cluster.
  6. Make sure the cluster is in a green state. If you're not
    already, use something like ElasticHQ, kopf or Marvel to monitor things.
  7. Let the cluster rebalance the current open indexes.
  8. Once that is ok and things are stable, reopen your closed
    indexes a month at a time, and let them rebalance.

That should get you back up and running. Once you're there we can go
back to your original post :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 11:15, Nishchay Shah electronish@gmail.com wrote:

Thanks Nate, but this doesn't work. node2 is not the master. So
starting it first didn't make sense, anyway I tried it and I couldn't
execute anything on a nonmaster node (node2) unless master was started

I started node2 (non master) and ran this: curl -XPUT
localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
after 30s I got this:
{"error":"MasterNotDiscoveredException[waited for
[30s]]","status":503}

I started node1 and as bloody expected elasticsearch copied all the
indexes :frowning: ..
"auto importing dangled indices"

I cannot believe I am unable to get this fundamental elasticsearch
feature working !

On Mon, May 5, 2014 at 4:25 PM, Nate Fox thefox@gmail.com wrote:

Get node2 running with rock. Then issue a disable_allocation and
then bring up node1.
curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'

From there, adjust the replica settings on the indexes down to 0 so
they dont copy. Once thats set, change disable_allocation to false.

On Mon, May 5, 2014 at 1:19 PM, Nish electronish@gmail.com wrote:

.."- Fire up both nodes, make sure they both have the same cluster
name"
<= This is exactly what I wrote in my second message is
where Elasticsearch is messing up. When I move the index to a new node and
delete that index from master and then start master node and other data
node, it (master) throws a message:
"auto importing dangled indices"
This means master is now copying the "deleted" index that exists
only on other node to itself !

Basically this is what happens:

  1. Node1 Master: rock,paper,scissors
  2. I move rock from Node 1 to Node 2 (I verify by starting ONLY
    node1 and I can see that I am missing data that was originally in "rock"
    index, as expected, all good)
  3. SO node1 now has paper,scissors
  4. I start Node2 with ONLY "rock" index (verify independently,
    it works)
  5. Then I start node 1 (master) and node 2(data)
  6. Node1 sees says "hey I don't have rock, but node2 has it,
    let me copy it to myself"

On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:

You might turn off the bootstrap.mlockall flag just for now -
it'll make ES swap a ton, but your error message looks like an OS level
issue. Make sure you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball somewhere
    (unless you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there that
    say not to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of the
    indexes
  • Fire up both nodes, make sure they both have the same cluster
    name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish elect...@gmail.com wrote:

Currently I have 279 indexes on a single node and elasticsearch
starts for few minutes and dies ; I only have 60G RAM on disk and as far as
I know 60% is the max that one should allocate to elasticsearch ; I tried
allocating 38G and it lasted for few more minutes and it died.

(I think there's some state files that tell ES/Lucene which
indexes are on disk)
=> Where is this ? How do I fix it so that
it doesn't move all indexes to all nodes ? I want to split the ~280 indexes
into two nodes of 140each. So far I am not able to achieve this as the
master keeps moving nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system
itself cant allocate the ram needed?
You might try jacking up the nofile to something like 999999 as
well? I'd definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire data
store, then remove some (I think there's some state files that tell
ES/Lucene which indexes are on disk), so it might recover if its missing
some and sees the others on another node?

As for your other questions, it depends on usage as to how many
nodes - especially search activity while indexing. We have 230 indexes
(1740 shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely
handle a lot more than what you're throwing at it. We dont search often nor
do we load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM
and 32*2.6GHz machine. I am actively indexing historic data with logstash.
It worked well with ~300 million documents (search and indexing were doing
ok) , but all of a sudden es fails to starts and keep itself up. It starts
for few minutes and I can query but fails with out of memory error. I
monitor the memory and atleast 12G of memory is available when it fails. I
had set the es_heap_size to 31G and then reduced it to 28, 24 and 18 and
the same error every time (see dump below)

*My security limits are as under (this is a test/POC server
thus "root" user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ]
[Sabretooth] loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] initialized
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] starting ...
[2014-05-04 13:30:15,531][INFO ][transport ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ]
[Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94
A2vHw][ip-10-109-136-59][inet[/10.109.136.59:9300]], reason:
zen-disco-join (elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ]
[Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ]
[Sabretooth] recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ]
[Sabretooth] started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack
guard pages failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack
guard pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(
0x00000007f7c70000, 196608, 0) failed; error='Cannot allocate
memory' (errno=12)

There is insufficient memory for the Java Runtime Environment

to continue.

Native memory allocation (malloc) failed to allocate 196608

bytes for committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max
number of indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes,
    how to do that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2
    years of data indexable ; Can I combine multiple indexes into one ? Like
    one month per month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a day

--
You received this message because you are subscribed to a topic
in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/
cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/564e2951-
ed54-4f34-97a9-4de88f187a7a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe
.
To unsubscribe from this group and all its topics, send an email
to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZWXuxDoOy6EVxbsqmXvdzRLkr4Waq4DB62vjyd6na4Ow%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624ZWXuxDoOy6EVxbsqmXvdzRLkr4Waq4DB62vjyd6na4Ow%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K4MP3QwGAgv%2BDpWLuYceMpQUamQs3boaxFGR6xXwhE9Jg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K4MP3QwGAgv%2BDpWLuYceMpQUamQs3boaxFGR6xXwhE9Jg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bP1vE9v7%3Dtcro55LX6KKT3bHWNbhuWfMoyd7TGdgP8jQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nish) #15

ok I got marvel up and running on my test instances

On Mon, May 5, 2014 at 11:27 PM, Mark Walkom markw@campaignmonitor.comwrote:

You need to install a monitoring plugin to gain better insight into what
is happening, it makes things a lot easier to visually see
cluster/node/index state and remove your shell commands, which may not be
100% representative of what ES is actually doing.

I suggest elastichq and marvel.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 13:06, Nishchay Shah electronish@gmail.com wrote:

FYI settings:
Master:
[root@ip-10-169-36-251 logstash-2013.12.05]# grep -vE "^$|^#"
/xx/elasticsearch-1.1.1/config/elasticsearch.yml
cluster.name: elasticsearchtest
node.name: "node1"
node.master: true
node.data: true
index.number_of_replicas: 0
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.169.36.251", "10.186.152.19"]
Non Master
[root@ip-10-186-152-19 logstash-2013.12.05]# grep -vE "^$|^#"
/elasticsearch/es/elasticsearch-1.1.1/config/elasticsearch.yml
cluster.name: elasticsearchtest
node.name: "node2"
node.master: false
node.data: true
index.number_of_replicas: 0
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.169.36.251","10.186.152.19"]

On Mon, May 5, 2014 at 11:01 PM, Nishchay Shah electronish@gmail.comwrote:

Probably not.

I deleted all data from slave and restarted both servers and I see this:

*Master: *
[root@ip-10-169-36-251 logstash-2013.12.22]# du -h --max-depth=1
16M ./0
16M ./1
8.0K ./_state
15M ./4
15M ./3
15M ./2
75M .

*Data: *

[root@ip-10-186-152-19 logstash-2013.12.22]# du -h --max-depth=1
16M ./0
16M ./1
15M ./4
15M ./3
15M ./2
75M .

On Mon, May 5, 2014 at 10:53 PM, Mark Walkom markw@campaignmonitor.comwrote:

Don't copy indexes on the OS level!

Is your new cluster balancing the shards?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 12:46, Nishchay Shah electronish@gmail.com wrote:

Hey Mark,
Thanks for the response. I have currently created two new medium test
instances (1 master 1 data only) because I didn't want to mess with the
main dataset. In my test setup, I have about 600MB of data ; 7 indexes

After looking around a lot I saw that the directory organization is
/elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes// and the master node has only 1 directory

(master)

ls /elasticsearch/es/elasticsearch-1.1.1/data/elasticsearchtest/nodes

0

So on node2 I created a "1" directory and moved 1 index from master to
data ; So master now has six indexes in 0 and data has one in 1.
When I started elasticsearch after that I got to a point where the
master is not NOT copying the data back to itself.. but now node2 is
copying master's data and making a "0" directory ; Also, I am unable to
query the node2's data !

On Mon, May 5, 2014 at 9:34 PM, Mark Walkom <markw@campaignmonitor.com

wrote:

Moving data on the OS level without making ES aware can cause
difficulties as you are seeing.

A few suggestions on how to resolve this and improve things in
general;

  1. Set your heap size to 31GB.
  2. Use Oracle's java, not OpenJDK.
  3. Set bootstrap.mlockall to true, you don't want to swap, ever.

Given the large number of indexes you have on node1, and to get to a
point where you can move some of these to a new node and stop the root
problem, it's going to be worth closing some of the older indexes. So try
these steps;

  1. Stop node2.
  2. Delete any data from the second node, to prevent things being
    auto imported again.
  3. Start node1, or restart it if it's running.
  4. Close all your indexes older than a month -
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html.
    You can use wildcards in index names to make the update easier. What this
    will do is tell ES to not load the index metadata into memory, which will
    help with your OOM issue.
  5. Start node2 and let it join the cluster.
  6. Make sure the cluster is in a green state. If you're not
    already, use something like ElasticHQ, kopf or Marvel to monitor things.
  7. Let the cluster rebalance the current open indexes.
  8. Once that is ok and things are stable, reopen your closed
    indexes a month at a time, and let them rebalance.

That should get you back up and running. Once you're there we can go
back to your original post :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 11:15, Nishchay Shah electronish@gmail.com wrote:

Thanks Nate, but this doesn't work. node2 is not the master. So
starting it first didn't make sense, anyway I tried it and I couldn't
execute anything on a nonmaster node (node2) unless master was started

I started node2 (non master) and ran this: curl -XPUT
localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
after 30s I got this:
{"error":"MasterNotDiscoveredException[waited for
[30s]]","status":503}

I started node1 and as bloody expected elasticsearch copied all the
indexes :frowning: ..
"auto importing dangled indices"

I cannot believe I am unable to get this fundamental elasticsearch
feature working !

On Mon, May 5, 2014 at 4:25 PM, Nate Fox thefox@gmail.com wrote:

Get node2 running with rock. Then issue a disable_allocation and
then bring up node1.
curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'

From there, adjust the replica settings on the indexes down to 0 so
they dont copy. Once thats set, change disable_allocation to false.

On Mon, May 5, 2014 at 1:19 PM, Nish electronish@gmail.com wrote:

.."- Fire up both nodes, make sure they both have the same
cluster name"
<= This is exactly what I wrote in my second
message is where Elasticsearch is messing up. When I move the index to a
new node and delete that index from master and then start master node and
other data node, it (master) throws a message:
"auto importing dangled indices"
This means master is now copying the "deleted" index that exists
only on other node to itself !

Basically this is what happens:

  1. Node1 Master: rock,paper,scissors
  2. I move rock from Node 1 to Node 2 (I verify by starting
    ONLY node1 and I can see that I am missing data that was originally in
    "rock" index, as expected, all good)
  3. SO node1 now has paper,scissors
  4. I start Node2 with ONLY "rock" index (verify independently,
    it works)
  5. Then I start node 1 (master) and node 2(data)
  6. Node1 sees says "hey I don't have rock, but node2 has it,
    let me copy it to myself"

On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:

You might turn off the bootstrap.mlockall flag just for now -
it'll make ES swap a ton, but your error message looks like an OS level
issue. Make sure you have lots of swap available and grab some coffee.

What I'd also try if turning off bootstrap.mlockall doesnt work:

  • Tarball the entire data directory and save the tarball
    somewhere (unless you dont care about the data)
  • Set 31Gb for your ES HEAP. There's plenty of docs out there
    that say not to go over 32Gb of ram cause it'll cause Java to go into 64bit
    mode.
  • Copy the entire data dir to node2
  • Go into the data dir on node1 and delete half of the indexes
  • Go into the data dir on node2 and delete the other half of
    the indexes
  • Fire up both nodes, make sure they both have the same cluster
    name

I have no idea if this'll work, I'm by no means an ES expert. :slight_smile:

On Mon, May 5, 2014 at 12:32 PM, Nish elect...@gmail.com wrote:

Currently I have 279 indexes on a single node and
elasticsearch starts for few minutes and dies ; I only have 60G RAM on disk
and as far as I know 60% is the max that one should allocate to
elasticsearch ; I tried allocating 38G and it lasted for few more minutes
and it died.

(I think there's some state files that tell ES/Lucene which
indexes are on disk)
=> Where is this ? How do I fix it so
that it doesn't move all indexes to all nodes ? I want to split the ~280
indexes into two nodes of 140each. So far I am not able to achieve this as
the master keeps moving nodes to itself !

On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:

How many indexes do you have? It almost looks like the system
itself cant allocate the ram needed?
You might try jacking up the nofile to something like 999999 as
well? I'd definitely go with 31g heapsize.

As for moving indexes, you might be able to copy the entire
data store, then remove some (I think there's some state files that tell
ES/Lucene which indexes are on disk), so it might recover if its missing
some and sees the others on another node?

As for your other questions, it depends on usage as to how many
nodes - especially search activity while indexing. We have 230 indexes
(1740 shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely
handle a lot more than what you're throwing at it. We dont search often nor
do we load a ton of data at once.

On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:

elasticsearch is set as a single node instance on a 60G RAM
and 32*2.6GHz machine. I am actively indexing historic data with logstash.
It worked well with ~300 million documents (search and indexing were doing
ok) , but all of a sudden es fails to starts and keep itself up. It starts
for few minutes and I can query but fails with out of memory error. I
monitor the memory and atleast 12G of memory is available when it fails. I
had set the es_heap_size to 31G and then reduced it to 28, 24 and 18 and
the same error every time (see dump below)

*My security limits are as under (this is a test/POC server
thus "root" user) *

root soft nofile 65536
root hard nofile 65536
root - memlock unlimited

*ES settings *
config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
bootstrap.mlockall: true

echo $ES_HEAP_SIZE
18432m

---DUMP----

bin/elasticsearch

[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
27:12Z]
[2014-05-04 13:30:12,653][INFO ][node ]
[Sabretooth] initializing ...
[2014-05-04 13:30:12,669][INFO ][plugins ]
[Sabretooth] loaded [], sites []
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] initialized
[2014-05-04 13:30:15,390][INFO ][node ]
[Sabretooth] starting ...
[2014-05-04 13:30:15,531][INFO ][transport ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.109.136.59:9300]}
[2014-05-04 13:30:18,553][INFO ][cluster.service ]
[Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94
A2vHw][ip-10-109-136-59][inet[/10.109.136.59:9300]], reason:
zen-disco-join (elected_as_master)
[2014-05-04 13:30:18,579][INFO ][discovery ]
[Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
[2014-05-04 13:30:18,790][INFO ][http ]
[Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.109.136.59:9200]}
[2014-05-04 13:30:19,976][INFO ][gateway ]
[Sabretooth] recovered [278] indices into cluster_state
[2014-05-04 13:30:19,984][INFO ][node ]
[Sabretooth] started
OpenJDK 64-Bit Server VM warning: Attempt to protect stack
guard pages failed.
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack
guard pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(
0x00000007f7c70000, 196608, 0) failed; error='Cannot allocate
memory' (errno=12)

There is insufficient memory for the Java Runtime

Environment to continue.

Native memory allocation (malloc) failed to allocate 196608

bytes for committing reserved memory.

An error report file with more information is saved as:

/tmp/jvm-19309/hs_error.log


*user untergeek on #logstash told me that I have reached a max
number of indices on a single node. Here are my questions: *

  1. Can I move half of my indexes to a new node ? If yes,
    how to do that without compromising indexes
  2. Logstash makes 1 index per day and I want to have 2
    years of data indexable ; Can I combine multiple indexes into one ? Like
    one month per month : this will mean I will not have more than 24 indexes.
  3. How many nodes are ideal for 24 moths of data ~1.5G a
    day

--
You received this message because you are subscribed to a topic
in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/
cEimyMnhSv0/unsubscribe.
To unsubscribe from this group and all its topics, send an
email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/564e2951-
ed54-4f34-97a9-4de88f187a7a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe
.
To unsubscribe from this group and all its topics, send an email
to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K74Q97T%2BqJTsqp2%3DSjur9qzAnfpXaLfVzWBevK1DarPZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624Zg%2B%3D9%3Dy5b%2BP81_%2BVduTRAV__cg2FYNoYxFtcjLYMm-QA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K4oQed7UteJioY-zCSQVU0z1rQWsqgWUoeELE6%3DNOfS8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZWXuxDoOy6EVxbsqmXvdzRLkr4Waq4DB62vjyd6na4Ow%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624ZWXuxDoOy6EVxbsqmXvdzRLkr4Waq4DB62vjyd6na4Ow%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANma5K4MP3QwGAgv%2BDpWLuYceMpQUamQs3boaxFGR6xXwhE9Jg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CANma5K4MP3QwGAgv%2BDpWLuYceMpQUamQs3boaxFGR6xXwhE9Jg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bP1vE9v7%3Dtcro55LX6KKT3bHWNbhuWfMoyd7TGdgP8jQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624bP1vE9v7%3Dtcro55LX6KKT3bHWNbhuWfMoyd7TGdgP8jQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANma5K5hqsLuLRropRR-9ahiLw5pRLqjFM0A2ZcjdXZB8xQ58Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #16