The server should not close sockets as soon as possible. Just tell
your client to reuse socket connections.
All that I can say that I once encountered exactly the same challenge.
After all, the second node didn't get the max file setting by
accident. After I fixed that, everything was fine.
But, if you submit large amounts of simultaneous bulk requests, the
sooner you get the caches filled and several Lucene indexer background
processes start off. This consumes a lot of system resources at a
sudden, files and sockets. For this, I had to limit the number of
parallel bulk requests at client side. There is a "sweet spot" between
client bulk index pressure and available network/OS capacity.
Currently, I submit 30 bulk requests in parallel at most. If the limit
is reached, which happens every few minutes, the bulk indexing client
have to wait for the server to finish previous requests, which can
take some time. Here, it may take up to 20 seconds or so, it depends
on the work the Lucene indexers must do.
On 17 Jul., 06:07, electic elec...@gmail.com wrote:
I highly doubt its that. The servers are configured to reuse the
socket and close it as soonas possible. It is also configured to the
max setting as to the number of sockets that can be open. The bulk
index is called rarely once we have enough records to send to the
first node. However, I have been testing it and I have a theory.
Basically there are two nodes in the cluster. If I just keep one node
and and disconnect the other one...it runs fine. However, once the
second node is activated it begins to show issues. The second node
runs out of memory which then causes an adverse effect on the node 1
and the entire cluster goes down. I am not sure what is causing Node 2
to run out of memory yet.
On Jul 16, 12:11 pm, jprante joergpra...@gmail.com wrote:
it looks like you ran out of sockets, not RAM. Sockets use file
handles. Check if the ES process is using the maximum number of open
files setting of the OS. It should be in the order of some 10.000. If
the value is ok, check your ES client code. Clients should reuse
socket connections, otherwise unused socket connections will thrash
the server and indexing will become slow. Do not repeat opening and
closing connections when using bulk indexing.
On Jul 15, 9:46 pm, electic elec...@gmail.com wrote:
Upgraded to 16.4, same situation with min and max the same. Ran top on
the box and it throws:
On node 2:
-bash: fork: Cannot allocate memory
-bash: fork: Cannot allocate memory
On node 1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
20852 root 20 0 10.7g 10g 10m S 100 69.3 182:55.07
so pretty sure it is a memory problem. What would cause the memory to
get exhausted like this?
On Jul 15, 12:01 am, electic elec...@gmail.com wrote:
Sorry about that, pasting slops. Here it is again:
17879 ? Sl 47:26 /usr/bin/java -Xms256m -Xmx10g -Xss128k -
Djline.enabled=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -
XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:
Only -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.path.home=/
usr/local/elasticsearch -Des-pidfile=/var/run/elasticsearch.pid -cp
elasticsearch -Des.path.logs=/var/log/elasticsearch -Des.path.data=/
I will change the parameters and do the suggestions. Be back with my
update. Thanks for your help!
On Jul 14, 10:31 pm, Shay Banon shay.ba...@elasticsearch.com wrote:
Use the same value for MIN and MAX, its recommended, especially if you use bootstrap.mlockall. Its explained here:http://www.elasticsearch.org/guide/reference/setup/installation.html.
Also, I only see part of the process parameters, I assume you have the rest of the parameters set...
When you run it, can you run bigdesk against it, check if its maxing out on memory?https://github.com/lukas-vlcek/bigdesk
Last, can you gist your config?
On Friday, July 15, 2011 at 8:18 AM, electic wrote:
- Yep, we have two nodes. Exact same config. And it is our own
- I have changed elasticsearch.in.sh (http://elasticsearch.in.sh) with a 512MB with a max of 10GB.
Here is the process running:
1636 pts/0 Sl 13:56 /usr/bin/java -Xms256m -Xmx10g -Xss128k -
- I am not sure what this means, can you elaborate. Happy to run any
debug on our test machines.
On Jul 14, 10:15 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:
Those are tricky to track down, lets start from the basics:
- I assume you run on your own hardware based on the configuration you posted? Or are you running on a virtualized system? Which operating system are you using?
- Maybe you get into memory problems with ES? How much memory are you allocating to elasticsearch? Can you run bigdesk against it and see how memory behaves? (upgrade to 0.16.4 if it happens).
- Are you using the bootstrap.mlockall option? If not, can you use it (just to make sure its not swapping)?
On Friday, July 15, 2011 at 7:34 AM, electic wrote:
I am still having trouble with Elastic Search. After a day of tens of
thousands of inserts via the bulk api we notice that Elastic Search
hangs. The REST api stops responding and the CPU is at 100 percent.
The box has 16GB of Ram and a 7200 RPM disk.
Is there some wehre I can look in the log about what is causing this?
The logs show no exceptions. The only cooky thing we are doing is
using the ID field to ensure uniqueness across the cluster.