I'm having some problems trying to get Elastic Search working at
$WORK. Specifically, that the machine it's running on becomes
unresponsive after a few hours of indexing. This is on a virgin
system that doesn't run anything else. Here's my configuration:
Amazon EC2 "Large" instance
Ubuntu 10.10, EBS-backed (Kernel ID aki-427d952b, AMI ID
ami-548c783d)
Java 1.6.0_20
Max files set to 65k, and I verified this by watching the log
messages when Elastic Search starts up
Elastic Search is spawned by Daemontools, with the following command
line: export ES_JAVA_OPTS="-server"; exec /usr/local/elasticsearch/bin/
elasticsearch -f -Des.max-open-files=true
Everything else is at its default setting
The issue is that Elastic search starts up fine, and runs fine, but if
I start indexing documents, after some hours, the machine will hang.
It still be responsive to ping, but any TCP connections such as SSH
will time out. According to Cloudwatch, the CPU and Network usage
drops to zero, so nothing on the machine is actually doing aynthing.
Examining the machine after the fact does not yield anything
interesting, as as /var/log/messages does not contain anything
unusual. I'm guessing some weird kernel issue is coming up, but I
cannot prove this.
I should also mention that we're subjecting Elastic Search to a VERY
high write load. Specifically, I'm using node.js to read many small
documents out of a database and add them to elastic search. We have
about 50 million documents in total, stored into a single index, at
the rate of 2,000 documents per second. In case it's worth
mentioning, those documents have a field that is of the MySQL BINARY
datatype. (since JSON usually isn't about binary data)
FWIW, the MySQL database is an RDS instance, and has given us zero
problems.
I'm out of ideas, because I've never ran into a problem like this
before. Does anyone have any suggestions for parameters I can check
or tweak, or things I can log to get to the bottom of this? I've
never really seen anything like this before, and any help
I'm having some problems trying to get Elastic Search working at
$WORK. Specifically, that the machine it's running on becomes
unresponsive after a few hours of indexing. This is on a virgin
system that doesn't run anything else. Here's my configuration:
Amazon EC2 "Large" instance
Ubuntu 10.10, EBS-backed (Kernel ID aki-427d952b, AMI ID
ami-548c783d)
Java 1.6.0_20
Max files set to 65k, and I verified this by watching the log
messages when Elastic Search starts up
Elastic Search is spawned by Daemontools, with the following command
line: export ES_JAVA_OPTS="-server"; exec /usr/local/elasticsearch/bin/
elasticsearch -f -Des.max-open-files=true
Everything else is at its default setting
The issue is that Elastic search starts up fine, and runs fine, but if
I start indexing documents, after some hours, the machine will hang.
It still be responsive to ping, but any TCP connections such as SSH
will time out. According to Cloudwatch, the CPU and Network usage
drops to zero, so nothing on the machine is actually doing aynthing.
Examining the machine after the fact does not yield anything
interesting, as as /var/log/messages does not contain anything
unusual. I'm guessing some weird kernel issue is coming up, but I
cannot prove this.
I should also mention that we're subjecting Elastic Search to a VERY
high write load. Specifically, I'm using node.js to read many small
documents out of a database and add them to Elasticsearch. We have
about 50 million documents in total, stored into a single index, at
the rate of 2,000 documents per second. In case it's worth
mentioning, those documents have a field that is of the MySQL BINARY
datatype. (since JSON usually isn't about binary data)
FWIW, the MySQL database is an RDS instance, and has given us zero
problems.
I'm out of ideas, because I've never ran into a problem like this
before. Does anyone have any suggestions for parameters I can check
or tweak, or things I can log to get to the bottom of this? I've
never really seen anything like this before, and any help
same problem for me with large index rebuild(2620000+ documents), and
also too much memory is used(5g+)
I can confirm in my case that memory is NOT an issue. The instance in
question has ~7 GB of RAM,and according to my Munin graphs, total
memory usage on that box doesn't go above 1.5 GB.
If you can, use a larger instance (m1.xlarge), it "suffers" from
noisy neighbors on aws less.
The fact that you can't connect to the machine might mean you are running
out of sockets and they are being throttled by the OS. Can you monitor it?
Are you using persistent connections to ES from node? netstat and lsof are
your friends here to check it.
I also saw this behavior way back with the AWS problems with ubuntu 10.04,
recently, I started to read that 10.10 has started to exhibit
similar behavior (see the instagram blog:
same problem for me with large index rebuild(2620000+ documents), and
also too much memory is used(5g+)
I can confirm in my case that memory is NOT an issue. The instance in
question has ~7 GB of RAM,and according to my Munin graphs, total
memory usage on that box doesn't go above 1.5 GB.
If you can, use a larger instance (m1.xlarge), it "suffers" from
noisy neighbors on aws less.
#1 and #2 are easy enough for me to do. #3 might be more of a
challenge, since it will cost us more.
The fact that you can't connect to the machine might mean you are running
out of sockets and they are being throttled by the OS. Can you monitor it?
Are you using persistent connections to ES from node? netstat and lsof are
your friends here to check it.
The connections are not persistent, but I use the generic-pool module
for node.js, which I use to limit to 10 slots, or 10 concurrent
connections to Elastic Search. I did check things with netstat and
lsof, and there are no issues there.
Interesting, as that's the first I heard of any concerns with 10.10.
We're running our entire infrastructure on 10.10 and have an average
of 1 freeze like this per machine per 6 months. (current issues
excepted)
If you can, use a larger instance (m1.xlarge), it "suffers" from
noisy neighbors on aws less.
#1 and #2 are easy enough for me to do. #3 might be more of a
challenge, since it will cost us more.
The fact that you can't connect to the machine might mean you are running
out of sockets and they are being throttled by the OS. Can you monitor
it?
Are you using persistent connections to ES from node? netstat and lsof
are
your friends here to check it.
The connections are not persistent, but I use the generic-pool module
for node.js, which I use to limit to 10 slots, or 10 concurrent
connections to Elastic Search. I did check things with netstat and
lsof, and there are no issues there.
Interesting, as that's the first I heard of any concerns with 10.10.
We're running our entire infrastructure on 10.10 and have an average
of 1 freeze like this per machine per 6 months. (current issues
excepted)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.