Problems with tcp connections

darkyoung · November 16, 2011, 6:58am

Hi

I have two servers running an elasticsearch cluster as our website's search
engine. And use Elastica as our php client.

At beginning, the queries are sent directly to ES but the servers are very
unstable and the tcp connections are about 500-600, so ES can't handle them
quickly and always get timeout response (We set the timeout to 5s). So we
added 5mins cache with memcached and the situation got better. The tcp
connections are controlled around 10 (avg).

I found that if the connections over 100 then it will become very unstable.

Does this because the server can't handle too much request? Or I need to
optimize my queries? (Most queries took about 50 ms)

Here is a gist of the node stats at some
point. https://gist.github.com/1369446

electic · November 16, 2011, 7:30am

Did you update your limits.conf? The number of acceptable connections
might be maxed out and hence why you are getting the timeouts.

On Nov 15, 10:58 pm, Ocean Wu darkyo...@gmail.com wrote:

Hi

I have two servers running an elasticsearch cluster as our website's search
engine. And use Elastica as our php client.

At beginning, the queries are sent directly to ES but the servers are very
unstable and the tcp connections are about 500-600, so ES can't handle them
quickly and always get timeout response (We set the timeout to 5s). So we
added 5mins cache with memcached and the situation got better. The tcp
connections are controlled around 10 (avg).

I found that if the connections over 100 then it will become very unstable.

Does this because the server can't handle too much request? Or I need to
optimize my queries? (Most queries took about 50 ms)

Here is a gist of the node stats at some
point.https://gist.github.com/1369446

darkyoung · November 16, 2011, 8:08am

Yes, the limits.conf set to 32000, and net.nf_conntrack_max set to 655360.

Thanks for reply.

electic · November 16, 2011, 9:04am

Then we are having the same issue:

https://groups.google.com/group/elasticsearch/browse_thread/thread/1861b5c253982c75

I notice when my total index size exceeds the RAM size (16GB of ram
per machine) the queries start to take a bit longer. Once the
connections pile up the entire cluster becomes massively unstable and
crashes. I have a theory as the dataset goes up in size what was once
a fast query suddenly is slow (my queries fetch data from a certain
time and sort) and I think that might be killing the cluster.

-R

On Nov 16, 12:08 am, Ocean Wu darkyo...@gmail.com wrote:

Yes, the limits.conf set to 32000, and net.nf_conntrack_max set to 655360.

Thanks for reply.

kimchy · November 16, 2011, 2:29pm

Can you try and use 0.18.3, see if it helps? It might be related to the
connection problem while searching fix.

On Wed, Nov 16, 2011 at 11:04 AM, electic electic@gmail.com wrote:

Then we are having the same issue:

https://groups.google.com/group/elasticsearch/browse_thread/thread/1861b5c253982c75

I notice when my total index size exceeds the RAM size (16GB of ram
per machine) the queries start to take a bit longer. Once the
connections pile up the entire cluster becomes massively unstable and
crashes. I have a theory as the dataset goes up in size what was once
a fast query suddenly is slow (my queries fetch data from a certain
time and sort) and I think that might be killing the cluster.

-R

On Nov 16, 12:08 am, Ocean Wu darkyo...@gmail.com wrote:

Yes, the limits.conf set to 32000, and net.nf_conntrack_max set to

Thanks for reply.

electic · November 16, 2011, 7:06pm

Sweet. Okay, I am running a test now. Will report on any changes.

On Nov 16, 6:29 am, Shay Banon kim...@gmail.com wrote:

Can you try and use 0.18.3, see if it helps? It might be related to the
connection problem while searching fix.

On Wed, Nov 16, 2011 at 11:04 AM, electic elec...@gmail.com wrote:

Then we are having the same issue:

https://groups.google.com/group/elasticsearch/browse_thread/thread/18...

I notice when my total index size exceeds the RAM size (16GB of ram
per machine) the queries start to take a bit longer. Once the
connections pile up the entire cluster becomes massively unstable and
crashes. I have a theory as the dataset goes up in size what was once
a fast query suddenly is slow (my queries fetch data from a certain
time and sort) and I think that might be killing the cluster.

-R

On Nov 16, 12:08 am, Ocean Wu darkyo...@gmail.com wrote:

Yes, the limits.conf set to 32000, and net.nf_conntrack_max set to

Thanks for reply.

darkyoung · November 17, 2011, 1:53am

Seems better after I upgrade to 0.18.3.

在 2011年11月16日星期三UTC+8下午10时29分01秒，kimchy写道：

Can you try and use 0.18.3, see if it helps? It might be related to the
connection problem while searching fix.

On Wed, Nov 16, 2011 at 11:04 AM, electic ele...@gmail.com wrote:

Then we are having the same issue:

https://groups.google.com/group/elasticsearch/browse_thread/thread/1861b5c253982c75

I notice when my total index size exceeds the RAM size (16GB of ram
per machine) the queries start to take a bit longer. Once the
connections pile up the entire cluster becomes massively unstable and
crashes. I have a theory as the dataset goes up in size what was once
a fast query suddenly is slow (my queries fetch data from a certain
time and sort) and I think that might be killing the cluster.

-R

On Nov 16, 12:08 am, Ocean Wu dark...@gmail.com wrote:

Yes, the limits.conf set to 32000, and net.nf_conntrack_max set to

Thanks for reply.

electic · November 17, 2011, 6:48pm

So I still seem to be having the same issue. I have two machines with
16GB RAM each on them. A 10,000 RPM drive. As the datasize increased
to about 20GB total, 20 million documents, the queries seem to be
taking longer and the connections start to backup until it no longer
seems to be taking HTTP requests.

The logs show nothing. There are no huge CPU usage, or heap usage,
just dead. Any ideas on what I can paste here in terms of logs to
debug the issue?

On Nov 16, 5:53 pm, Ocean Wu darkyo...@gmail.com wrote:

Seems better after I upgrade to 0.18.3.

在 2011年11月16日星期三UTC+8下午10时29分01秒，kimchy写道：

Can you try and use 0.18.3, see if it helps? It might be related to the
connection problem while searching fix.

On Wed, Nov 16, 2011 at 11:04 AM, electic ele...@gmail.com wrote:

Then we are having the same issue:

https://groups.google.com/group/elasticsearch/browse_thread/thread/18...

I notice when my total index size exceeds the RAM size (16GB of ram
per machine) the queries start to take a bit longer. Once the
connections pile up the entire cluster becomes massively unstable and
crashes. I have a theory as the dataset goes up in size what was once
a fast query suddenly is slow (my queries fetch data from a certain
time and sort) and I think that might be killing the cluster.

-R

On Nov 16, 12:08 am, Ocean Wu dark...@gmail.com wrote:

Yes, the limits.conf set to 32000, and net.nf_conntrack_max set to

Thanks for reply.

electic · November 17, 2011, 7:21pm

Here is my status after restarting the second node (the node that
handles all the query requests):

https://raw.github.com/gist/1374154/dc4df73f7fecb81491823ea7c51a6e00fa2c2ae3/gistfile1.txt

On Nov 17, 10:48 am, electic elec...@gmail.com wrote:

So I still seem to be having the same issue. I have two machines with
16GB RAM each on them. A 10,000 RPM drive. As the datasize increased
to about 20GB total, 20 million documents, the queries seem to be
taking longer and the connections start to backup until it no longer
seems to be taking HTTP requests.

The logs show nothing. There are no huge CPU usage, or heap usage,
just dead. Any ideas on what I can paste here in terms of logs to
debug the issue?

On Nov 16, 5:53 pm, Ocean Wu darkyo...@gmail.com wrote:

Seems better after I upgrade to 0.18.3.

在 2011年11月16日星期三UTC+8下午10时29分01秒，kimchy写道：

Can you try and use 0.18.3, see if it helps? It might be related to the
connection problem while searching fix.

On Wed, Nov 16, 2011 at 11:04 AM, electic ele...@gmail.com wrote:

Then we are having the same issue:

https://groups.google.com/group/elasticsearch/browse_thread/thread/18...

I notice when my total index size exceeds the RAM size (16GB of ram
per machine) the queries start to take a bit longer. Once the
connections pile up the entire cluster becomes massively unstable and
crashes. I have a theory as the dataset goes up in size what was once
a fast query suddenly is slow (my queries fetch data from a certain
time and sort) and I think that might be killing the cluster.

-R

On Nov 16, 12:08 am, Ocean Wu dark...@gmail.com wrote:

Yes, the limits.conf set to 32000, and net.nf_conntrack_max set to

Thanks for reply.

electic · November 17, 2011, 11:13pm

I think this might have something to do with the merge policy. It is
happening around 20GB. Any ideas?

On Nov 17, 11:21 am, electic elec...@gmail.com wrote:

Here is my status after restarting the second node (the node that
handles all the query requests):

https://raw.github.com/gist/1374154/dc4df73f7fecb81491823ea7c51a6e00f...

On Nov 17, 10:48 am, electic elec...@gmail.com wrote:

So I still seem to be having the same issue. I have two machines with
16GB RAM each on them. A 10,000 RPM drive. As the datasize increased
to about 20GB total, 20 million documents, the queries seem to be
taking longer and the connections start to backup until it no longer
seems to be taking HTTP requests.

The logs show nothing. There are no huge CPU usage, or heap usage,
just dead. Any ideas on what I can paste here in terms of logs to
debug the issue?

On Nov 16, 5:53 pm, Ocean Wu darkyo...@gmail.com wrote:

Seems better after I upgrade to 0.18.3.

在 2011年11月16日星期三UTC+8下午10时29分01秒，kimchy写道：

Can you try and use 0.18.3, see if it helps? It might be related to the
connection problem while searching fix.

On Wed, Nov 16, 2011 at 11:04 AM, electic ele...@gmail.com wrote:

Then we are having the same issue:

https://groups.google.com/group/elasticsearch/browse_thread/thread/18...

I notice when my total index size exceeds the RAM size (16GB of ram
per machine) the queries start to take a bit longer. Once the
connections pile up the entire cluster becomes massively unstable and
crashes. I have a theory as the dataset goes up in size what was once
a fast query suddenly is slow (my queries fetch data from a certain
time and sort) and I think that might be killing the cluster.

-R

On Nov 16, 12:08 am, Ocean Wu dark...@gmail.com wrote:

Yes, the limits.conf set to 32000, and net.nf_conntrack_max set to

Thanks for reply.

kimchy · November 20, 2011, 8:16am

Are you using connections with keep alive / persistent connection? If you
open and close connections constantly, maybe the OS is throttling them?

On Thu, Nov 17, 2011 at 8:48 PM, electic electic@gmail.com wrote:

So I still seem to be having the same issue. I have two machines with
16GB RAM each on them. A 10,000 RPM drive. As the datasize increased
to about 20GB total, 20 million documents, the queries seem to be
taking longer and the connections start to backup until it no longer
seems to be taking HTTP requests.

The logs show nothing. There are no huge CPU usage, or heap usage,
just dead. Any ideas on what I can paste here in terms of logs to
debug the issue?

On Nov 16, 5:53 pm, Ocean Wu darkyo...@gmail.com wrote:

Seems better after I upgrade to 0.18.3.

在 2011年11月16日星期三UTC+8下午10时29分01秒，kimchy写道：

Can you try and use 0.18.3, see if it helps? It might be related to the
connection problem while searching fix.

On Wed, Nov 16, 2011 at 11:04 AM, electic ele...@gmail.com wrote:

Then we are having the same issue:

https://groups.google.com/group/elasticsearch/browse_thread/thread/18.
..

I notice when my total index size exceeds the RAM size (16GB of ram
per machine) the queries start to take a bit longer. Once the
connections pile up the entire cluster becomes massively unstable and
crashes. I have a theory as the dataset goes up in size what was once
a fast query suddenly is slow (my queries fetch data from a certain
time and sort) and I think that might be killing the cluster.

-R

On Nov 16, 12:08 am, Ocean Wu dark...@gmail.com wrote:

Yes, the limits.conf set to 32000, and net.nf_conntrack_max set to

Thanks for reply.

Topic		Replies	Views
TCP backlog overload on 9200 Elasticsearch	9	3098	July 5, 2017
Our Elastic search server can not serve many connections for a long time Elasticsearch	1	690	July 6, 2017
Elasticsearch connection idle timeout Elasticsearch	8	5671	November 13, 2020
Dropped HTTP Connections when Indexing Elasticsearch	2	376	July 6, 2017
Elasticsearch OpenJDK process takes over every available TCP port Elasticsearch	6	624	February 14, 2021

Problems with tcp connections

Related topics