I am still having trouble with Elastic Search. After a day of tens of
thousands of inserts via the bulk api we notice that Elastic Search
hangs. The REST api stops responding and the CPU is at 100 percent.
The box has 16GB of Ram and a 7200 RPM disk.
Is there some wehre I can look in the log about what is causing this?
The logs show no exceptions. The only cooky thing we are doing is
using the ID field to ensure uniqueness across the cluster.
Those are tricky to track down, lets start from the basics:
I assume you run on your own hardware based on the configuration you posted? Or are you running on a virtualized system? Which operating system are you using?
Maybe you get into memory problems with ES? How much memory are you allocating to elasticsearch? Can you run bigdesk against it and see how memory behaves? (upgrade to 0.16.4 if it happens).
Are you using the bootstrap.mlockall option? If not, can you use it (just to make sure its not swapping)?
-shay.banon
On Friday, July 15, 2011 at 7:34 AM, electic wrote:
I am still having trouble with Elastic Search. After a day of tens of
thousands of inserts via the bulk api we notice that Elastic Search
hangs. The REST api stops responding and the CPU is at 100 percent.
The box has 16GB of Ram and a 7200 RPM disk.
Is there some wehre I can look in the log about what is causing this?
The logs show no exceptions. The only cooky thing we are doing is
using the ID field to ensure uniqueness across the cluster.
Those are tricky to track down, lets start from the basics:
I assume you run on your own hardware based on the configuration you posted? Or are you running on a virtualized system? Which operating system are you using?
Maybe you get into memory problems with ES? How much memory are you allocating to elasticsearch? Can you run bigdesk against it and see how memory behaves? (upgrade to 0.16.4 if it happens).
Are you using the bootstrap.mlockall option? If not, can you use it (just to make sure its not swapping)?
-shay.banon
On Friday, July 15, 2011 at 7:34 AM, electic wrote:
I am still having trouble with Elastic Search. After a day of tens of
thousands of inserts via the bulk api we notice that Elastic Search
hangs. The REST api stops responding and the CPU is at 100 percent.
The box has 16GB of Ram and a 7200 RPM disk.
Is there some wehre I can look in the log about what is causing this?
The logs show no exceptions. The only cooky thing we are doing is
using the ID field to ensure uniqueness across the cluster.
Those are tricky to track down, lets start from the basics:
I assume you run on your own hardware based on the configuration you posted? Or are you running on a virtualized system? Which operating system are you using?
Maybe you get into memory problems with ES? How much memory are you allocating to elasticsearch? Can you run bigdesk against it and see how memory behaves? (upgrade to 0.16.4 if it happens).
Are you using the bootstrap.mlockall option? If not, can you use it (just to make sure its not swapping)?
-shay.banon
On Friday, July 15, 2011 at 7:34 AM, electic wrote:
I am still having trouble with Elastic Search. After a day of tens of
thousands of inserts via the bulk api we notice that Elastic Search
hangs. The REST api stops responding and the CPU is at 100 percent.
The box has 16GB of Ram and a 7200 RPM disk.
Is there some wehre I can look in the log about what is causing this?
The logs show no exceptions. The only cooky thing we are doing is
using the ID field to ensure uniqueness across the cluster.
Those are tricky to track down, lets start from the basics:
I assume you run on your own hardware based on the configuration you posted? Or are you running on a virtualized system? Which operating system are you using?
Maybe you get into memory problems with ES? How much memory are you allocating to elasticsearch? Can you run bigdesk against it and see how memory behaves? (upgrade to 0.16.4 if it happens).
Are you using the bootstrap.mlockall option? If not, can you use it (just to make sure its not swapping)?
-shay.banon
On Friday, July 15, 2011 at 7:34 AM, electic wrote:
I am still having trouble with Elastic Search. After a day of tens of
thousands of inserts via the bulk api we notice that Elastic Search
hangs. The REST api stops responding and the CPU is at 100 percent.
The box has 16GB of Ram and a 7200 RPM disk.
Is there some wehre I can look in the log about what is causing this?
The logs show no exceptions. The only cooky thing we are doing is
using the ID field to ensure uniqueness across the cluster.
Those are tricky to track down, lets start from the basics:
I assume you run on your own hardware based on the configuration you posted? Or are you running on a virtualized system? Which operating system are you using?
Maybe you get into memory problems with ES? How much memory are you allocating to elasticsearch? Can you run bigdesk against it and see how memory behaves? (upgrade to 0.16.4 if it happens).
Are you using the bootstrap.mlockall option? If not, can you use it (just to make sure its not swapping)?
-shay.banon
On Friday, July 15, 2011 at 7:34 AM, electic wrote:
I am still having trouble with Elastic Search. After a day of tens of
thousands of inserts via the bulk api we notice that Elastic Search
hangs. The REST api stops responding and the CPU is at 100 percent.
The box has 16GB of Ram and a 7200 RPM disk.
Is there some wehre I can look in the log about what is causing this?
The logs show no exceptions. The only cooky thing we are doing is
using the ID field to ensure uniqueness across the cluster.
it looks like you ran out of sockets, not RAM. Sockets use file
handles. Check if the ES process is using the maximum number of open
files setting of the OS. It should be in the order of some 10.000. If
the value is ok, check your ES client code. Clients should reuse
socket connections, otherwise unused socket connections will thrash
the server and indexing will become slow. Do not repeat opening and
closing connections when using bulk indexing.
Those are tricky to track down, lets start from the basics:
I assume you run on your own hardware based on the configuration you posted? Or are you running on a virtualized system? Which operating system are you using?
Maybe you get into memory problems with ES? How much memory are you allocating to elasticsearch? Can you run bigdesk against it and see how memory behaves? (upgrade to 0.16.4 if it happens).
Are you using the bootstrap.mlockall option? If not, can you use it (just to make sure its not swapping)?
-shay.banon
On Friday, July 15, 2011 at 7:34 AM, electic wrote:
I am still having trouble with Elastic Search. After a day of tens of
thousands of inserts via the bulk api we notice that Elastic Search
hangs. The REST api stops responding and the CPU is at 100 percent.
The box has 16GB of Ram and a 7200 RPM disk.
Is there some wehre I can look in the log about what is causing this?
The logs show no exceptions. The only cooky thing we are doing is
using the ID field to ensure uniqueness across the cluster.
I highly doubt its that. The servers are configured to reuse the
socket and close it as soon as possible. It is also configured to the
max setting as to the number of sockets that can be open. The bulk
index is called rarely once we have enough records to send to the
first node. However, I have been testing it and I have a theory.
Basically there are two nodes in the cluster. If I just keep one node
and and disconnect the other one...it runs fine. However, once the
second node is activated it begins to show issues. The second node
runs out of memory which then causes an adverse effect on the node 1
and the entire cluster goes down. I am not sure what is causing Node 2
to run out of memory yet.
it looks like you ran out of sockets, not RAM. Sockets use file
handles. Check if the ES process is using the maximum number of open
files setting of the OS. It should be in the order of some 10.000. If
the value is ok, check your ES client code. Clients should reuse
socket connections, otherwise unused socket connections will thrash
the server and indexing will become slow. Do not repeat opening and
closing connections when using bulk indexing.
Those are tricky to track down, lets start from the basics:
I assume you run on your own hardware based on the configuration you posted? Or are you running on a virtualized system? Which operating system are you using?
Maybe you get into memory problems with ES? How much memory are you allocating to elasticsearch? Can you run bigdesk against it and see how memory behaves? (upgrade to 0.16.4 if it happens).
Are you using the bootstrap.mlockall option? If not, can you use it (just to make sure its not swapping)?
-shay.banon
On Friday, July 15, 2011 at 7:34 AM, electic wrote:
I am still having trouble with Elastic Search. After a day of tens of
thousands of inserts via the bulk api we notice that Elastic Search
hangs. The REST api stops responding and the CPU is at 100 percent.
The box has 16GB of Ram and a 7200 RPM disk.
Is there some wehre I can look in the log about what is causing this?
The logs show no exceptions. The only cooky thing we are doing is
using the ID field to ensure uniqueness across the cluster.
The server should not close sockets as soon as possible. Just tell
your client to reuse socket connections.
All that I can say that I once encountered exactly the same challenge.
After all, the second node didn't get the max file setting by
accident. After I fixed that, everything was fine.
But, if you submit large amounts of simultaneous bulk requests, the
sooner you get the caches filled and several Lucene indexer background
processes start off. This consumes a lot of system resources at a
sudden, files and sockets. For this, I had to limit the number of
parallel bulk requests at client side. There is a "sweet spot" between
client bulk index pressure and available network/OS capacity.
Currently, I submit 30 bulk requests in parallel at most. If the limit
is reached, which happens every few minutes, the bulk indexing client
have to wait for the server to finish previous requests, which can
take some time. Here, it may take up to 20 seconds or so, it depends
on the work the Lucene indexers must do.
I highly doubt its that. The servers are configured to reuse the
socket and close it as soonas possible. It is also configured to the
max setting as to the number of sockets that can be open. The bulk
index is called rarely once we have enough records to send to the
first node. However, I have been testing it and I have a theory.
Basically there are two nodes in the cluster. If I just keep one node
and and disconnect the other one...it runs fine. However, once the
second node is activated it begins to show issues. The second node
runs out of memory which then causes an adverse effect on the node 1
and the entire cluster goes down. I am not sure what is causing Node 2
to run out of memory yet.
it looks like you ran out of sockets, not RAM. Sockets use file
handles. Check if the ES process is using the maximum number of open
files setting of the OS. It should be in the order of some 10.000. If
the value is ok, check your ES client code. Clients should reuse
socket connections, otherwise unused socket connections will thrash
the server and indexing will become slow. Do not repeat opening and
closing connections when using bulk indexing.
Those are tricky to track down, lets start from the basics:
I assume you run on your own hardware based on the configuration you posted? Or are you running on a virtualized system? Which operating system are you using?
Maybe you get into memory problems with ES? How much memory are you allocating to elasticsearch? Can you run bigdesk against it and see how memory behaves? (upgrade to 0.16.4 if it happens).
Are you using the bootstrap.mlockall option? If not, can you use it (just to make sure its not swapping)?
-shay.banon
On Friday, July 15, 2011 at 7:34 AM, electic wrote:
I am still having trouble with Elastic Search. After a day of tens of
thousands of inserts via the bulk api we notice that Elastic Search
hangs. The REST api stops responding and the CPU is at 100 percent.
The box has 16GB of Ram and a 7200 RPM disk.
Is there some wehre I can look in the log about what is causing this?
The logs show no exceptions. The only cooky thing we are doing is
using the ID field to ensure uniqueness across the cluster.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.