Elasticsearch diminishes throughput behind a load balancer.... why?


(Dario Rossi) #1

So I have two instances of ES in production and we put them in front of a
load balancer that does round robin balancing and access control. I know
that the recommended way to balance ES is to use a non data node, but we
need access control.

So these are the results after some testing with JMeter:

WITHOUT LOAD BALANCER (only one instance):

Throughput 165M requests/minute, ~28ms average latency.

WITH THE LOAD BALANCER (one here would expect the throughput to roughly
double and latency to increase)

TP 124M req/minute and ~85ms average latency....

And that's unexepected... because I am allright with the latency
increasing, but I am not allright with the throughput decreasing.

We are using NGINX to load balance and access control and I've set index.auto_expand_replicas
to 0-all

Any ideas why I am getting this result?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

Is the time you measure come from took field?
Or is it the time you can see on the client side?

Could you measure both?

Does Nginx use persistent connections?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 21 novembre 2013 at 17:57:11, Dario Rossi (darioros@gmail.com) a écrit:

So I have two instances of ES in production and we put them in front of a load balancer that does round robin balancing and access control. I know that the recommended way to balance ES is to use a non data node, but we need access control.

So these are the results after some testing with JMeter:

WITHOUT LOAD BALANCER (only one instance):

Throughput 165M requests/minute, ~28ms average latency.

WITH THE LOAD BALANCER (one here would expect the throughput to roughly double and latency to increase)

TP 124M req/minute and ~85ms average latency....

And that's unexepected... because I am allright with the latency increasing, but I am not allright with the throughput decreasing.

We are using NGINX to load balance and access control and I've set index.auto_expand_replicas to 0-all

Any ideas why I am getting this result?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Dario Rossi) #3

The time measure comes from JMeter and it is the average response time
(from connection open to close). So yes, *this is the time on client side. *
I'm not interested in the "took" field.

I don't know if NGINX uses persistent connections... I've to ask the
techops. Will do tomorrow (he's gone ...). Why? What would be the best
arrangement?

Il giorno giovedì 21 novembre 2013 17:10:05 UTC, David Pilato ha scritto:

Is the time you measure come from took field?
Or is it the time you can see on the client side?

Could you measure both?

Does Nginx use persistent connections?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonethttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fdadoonet&sa=D&sntz=1&usg=AFQjCNE-DMC3YEu3X_lhRIhUzuSZGsaSqA
| @elasticsearchfrhttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Felasticsearchfr&sa=D&sntz=1&usg=AFQjCNGfXdQ98RWFMJXdiqpKnZb5GMg0zA

Le 21 novembre 2013 at 17:57:11, Dario Rossi (dari...@gmail.com<javascript:>)
a écrit:

So I have two instances of ES in production and we put them in front of a
load balancer that does round robin balancing and access control. I know
that the recommended way to balance ES is to use a non data node, but we
need access control.

So these are the results after some testing with JMeter:

WITHOUT LOAD BALANCER (only one instance):

Throughput 165M requests/minute, ~28ms average latency.

WITH THE LOAD BALANCER (one here would expect the throughput to roughly
double and latency to increase)

TP 124M req/minute and ~85ms average latency....

And that's unexepected... because I am allright with the latency
increasing, but I am not allright with the throughput decreasing.

We are using NGINX to load balance and access control and I've set index.auto_expand_replicas
to 0-all

Any ideas why I am getting this result?

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #4

Because if you are opening a new connection each time you send a request you probably are spending some time on that.

That said, I am interested in the mean value of took field.
I mean that it could help to build the right diagnostic: is the problem coming from elasticsearch or from NGinx?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 21 novembre 2013 at 18:32:54, Dario Rossi (darioros@gmail.com) a écrit:

The time measure comes from JMeter and it is the average response time (from connection open to close). So yes, this is the time on client side.
I'm not interested in the "took" field.

I don't know if NGINX uses persistent connections... I've to ask the techops. Will do tomorrow (he's gone ...). Why? What would be the best arrangement?

Il giorno giovedì 21 novembre 2013 17:10:05 UTC, David Pilato ha scritto:
Is the time you measure come from took field?
Or is it the time you can see on the client side?

Could you measure both?

Does Nginx use persistent connections?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 21 novembre 2013 at 17:57:11, Dario Rossi (dari...@gmail.com) a écrit:

So I have two instances of ES in production and we put them in front of a load balancer that does round robin balancing and access control. I know that the recommended way to balance ES is to use a non data node, but we need access control.

So these are the results after some testing with JMeter:

WITHOUT LOAD BALANCER (only one instance):

Throughput 165M requests/minute, ~28ms average latency.

WITH THE LOAD BALANCER (one here would expect the throughput to roughly double and latency to increase)

TP 124M req/minute and ~85ms average latency....

And that's unexepected... because I am allright with the latency increasing, but I am not allright with the throughput decreasing.

We are using NGINX to load balance and access control and I've set index.auto_expand_replicas to 0-all

Any ideas why I am getting this result?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Dario Rossi) #5

I really think the problem comes from NGINX because if I do the test by
sending request to a machine of the cluster and bypassing NGINX it is much
faster.

Il giorno giovedì 21 novembre 2013 17:35:22 UTC, David Pilato ha scritto:

Because if you are opening a new connection each time you send a request
you probably are spending some time on that.

That said, I am interested in the mean value of took field.
I mean that it could help to build the right diagnostic: is the problem
coming from elasticsearch or from NGinx?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonethttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fdadoonet&sa=D&sntz=1&usg=AFQjCNE-DMC3YEu3X_lhRIhUzuSZGsaSqA
| @elasticsearchfrhttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Felasticsearchfr&sa=D&sntz=1&usg=AFQjCNGfXdQ98RWFMJXdiqpKnZb5GMg0zA

Le 21 novembre 2013 at 18:32:54, Dario Rossi (dari...@gmail.com<javascript:>)
a écrit:

The time measure comes from JMeter and it is the average response time
(from connection open to close). So yes, *this is the time on client
side. *
I'm not interested in the "took" field.

I don't know if NGINX uses persistent connections... I've to ask the
techops. Will do tomorrow (he's gone ...). Why? What would be the best
arrangement?

Il giorno giovedì 21 novembre 2013 17:10:05 UTC, David Pilato ha scritto:

Is the time you measure come from took field?
Or is it the time you can see on the client side?

Could you measure both?

Does Nginx use persistent connections?

 -- 

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonethttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fdadoonet&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNE-DMC3YEu3X_lhRIhUzuSZGsaSqA
| @elasticsearchfrhttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Felasticsearchfr&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNGfXdQ98RWFMJXdiqpKnZb5GMg0zA

Le 21 novembre 2013 at 17:57:11, Dario Rossi (dari...@gmail.com) a écrit:

So I have two instances of ES in production and we put them in front of
a load balancer that does round robin balancing and access control. I know
that the recommended way to balance ES is to use a non data node, but we
need access control.

So these are the results after some testing with JMeter:

WITHOUT LOAD BALANCER (only one instance):

Throughput 165M requests/minute, ~28ms average latency.

WITH THE LOAD BALANCER (one here would expect the throughput to roughly
double and latency to increase)

TP 124M req/minute and ~85ms average latency....

And that's unexepected... because I am allright with the latency
increasing, but I am not allright with the throughput decreasing.

We are using NGINX to load balance and access control and I've set index.auto_expand_replicas
to 0-all

Any ideas why I am getting this result?

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6