Performance and preference _local


(Ridvan Gyundogan) #1

Hi,
I do some performance tests and have some questions regarding this.
Our architecture for the tests is very simple we have ec2 image
containing ES 16.2 and tomcat.
The data for the ES is stored in s3 gateway - only 6k documents.

The tomcat makes search requests to the local ES instance and returns
the response to jmetter. No communication between the tomcats.

We have 1 index with 5 shards and 1 replica.

When doing tests against 1 ec2 instance we are stable at 80 jmetter
threads.

When doing the tests against 2 ec2 instances we are stable at 140
threads. Going to 160 threads leads to timeout errors. We use
searchRequestBuilder.setPreference("_local") before each request.

My expectations are that if each of the ec2 instances has all the
shards or their replicas there should not be communication between the
2 ES nodes for the search requests. So we should be stable at 160
threads or not?

Do you know how to check whether there is ore there is no network
traffic between the 2 ES nodes related to the search requests?


(Ridvan Gyundogan) #2

To add something to this:
when under load if I search for the count of the open files from the
es server i see that the second server which is supposed to be the
replica has 2 times the open files of the first server which holds the
primary shards. Any idea?

On Aug 3, 8:45 pm, Ridvan Gyundogan ridva...@gmail.com wrote:

Hi,
I do some performance tests and have some questions regarding this.
Our architecture for the tests is very simple we have ec2 image
containing ES 16.2 and tomcat.
The data for the ES is stored in s3 gateway - only 6k documents.

The tomcat makes search requests to the local ES instance and returns
the response to jmetter. No communication between the tomcats.

We have 1 index with 5 shards and 1 replica.

When doing tests against 1 ec2 instance we are stable at 80 jmetter
threads.

When doing the tests against 2 ec2 instances we are stable at 140
threads. Going to 160 threads leads to timeout errors. We use
searchRequestBuilder.setPreference("_local") before each request.

My expectations are that if each of the ec2 instances has all the
shards or their replicas there should not be communication between the
2 ES nodes for the search requests. So we should be stable at 160
threads or not?

Do you know how to check whether there is ore there is no network
traffic between the 2 ES nodes related to the search requests?


(Shay Banon) #3

When you set the preference to local, then if it can, it will prefer
execution on the local node, yes, but, I don't think that thats what will
help your case.

Regarding hte number of open files, they can be different between nodes. The
fact that they have more or less primary shards is not relevant, they might
simply be on a different merge state of the index (the internal optimization
that goes on).

I would check a few things in your case: First, check that jmeter is not
going through memory problems. Also, I found that jmeter is not behaving
properly with many threads executing, I suspect that its because of bad
concurrency implementation on the stats data structures it has.

Second, see that you don't overflow the system. Check if you get into big
iowait, check the load, and the likes. Also, on AWS, you might end up being
on the same machine where netflix is busy transcoding inception or
something...

On Thu, Aug 4, 2011 at 5:05 PM, Ridvan Gyundogan ridvansg@gmail.com wrote:

To add something to this:
when under load if I search for the count of the open files from the
es server i see that the second server which is supposed to be the
replica has 2 times the open files of the first server which holds the
primary shards. Any idea?

On Aug 3, 8:45 pm, Ridvan Gyundogan ridva...@gmail.com wrote:

Hi,
I do some performance tests and have some questions regarding this.
Our architecture for the tests is very simple we have ec2 image
containing ES 16.2 and tomcat.
The data for the ES is stored in s3 gateway - only 6k documents.

The tomcat makes search requests to the local ES instance and returns
the response to jmetter. No communication between the tomcats.

We have 1 index with 5 shards and 1 replica.

When doing tests against 1 ec2 instance we are stable at 80 jmetter
threads.

When doing the tests against 2 ec2 instances we are stable at 140
threads. Going to 160 threads leads to timeout errors. We use
searchRequestBuilder.setPreference("_local") before each request.

My expectations are that if each of the ec2 instances has all the
shards or their replicas there should not be communication between the
2 ES nodes for the search requests. So we should be stable at 160
threads or not?

Do you know how to check whether there is ore there is no network
traffic between the 2 ES nodes related to the search requests?


(system) #4