Trying to understand search scalability

Hi,

I guess that's another noob question. I'm trying to figure out how to
improve search scalability, e.g. the size of my index won't grow much,
but the number of concurrent queries should increase (our customer
estimates 30 req/s at long term).

Right now, I have
- 4 test machines, dual cores
- 1 index
- 2 shards, 1 replica

The shards are activated like this :
0 : server2, server3
1 : server1, server4

To test it, I run a client (on my development PC) which runs 2 threads
which concurrently send queries to the cluster. Each thread runs a
series of ~6000 queries. Some queries are short, some are longer to
execute (from 30ms to 800ms). Running this, I see that both cores of
server2 and server4 are fully used (100%), while server1 and server3 are
unused. There doesn't seem to be queries sent to those servers.

At first, I thought it was my sharding and replica configuration which
was wrong (I had let the default configuration of 5 shards and 1
replica), because I didn't see any improvement in the number of requests
processed by second by the cluster over the test period by adding nodes.
That's why I switched to a simpler configuration with 2 shards and 1
replica, and figured out that two of the four nodes seem to be unused.

Is this a sharding/replica configuration issue, or should I tune other
parameters ?

Thanks a lot.

Update. I increased my number of parallel client threads to 8, and it
did not help : it's always the two same servers which execute queries.
Tell me if I'm wrong, but it seems that using the java api (NodeBuilder

  • client mode), there's no automatic round robin available. I have found
    in the list archives that it is recommended to query nodes
    alternatively, but there doesn't seem to be any way do this using a
    Client.prepareSearch() builder. I have worked around this using multiple
    cached clients (one per thread), but :
    • it seems difficult to rely on this on an application server
    • sometimes nodes are still unused (I guess it depends on the
      discovery order)

Le 17/06/2011 19:08, Cédric Champeau a écrit :

Hi,

I guess that's another noob question. I'm trying to figure out how to
improve search scalability, e.g. the size of my index won't grow much,
but the number of concurrent queries should increase (our customer
estimates 30 req/s at long term).

Right now, I have
- 4 test machines, dual cores
- 1 index
- 2 shards, 1 replica

The shards are activated like this :
0 : server2, server3
1 : server1, server4

To test it, I run a client (on my development PC) which runs 2 threads
which concurrently send queries to the cluster. Each thread runs a
series of ~6000 queries. Some queries are short, some are longer to
execute (from 30ms to 800ms). Running this, I see that both cores of
server2 and server4 are fully used (100%), while server1 and server3
are unused. There doesn't seem to be queries sent to those servers.

At first, I thought it was my sharding and replica configuration which
was wrong (I had let the default configuration of 5 shards and 1
replica), because I didn't see any improvement in the number of
requests processed by second by the cluster over the test period by
adding nodes. That's why I switched to a simpler configuration with 2
shards and 1 replica, and figured out that two of the four nodes seem
to be unused.

Is this a sharding/replica configuration issue, or should I tune other
parameters ?

Thanks a lot.

NodeClient will do round robin between shards and replicas, can you try and execute the same query (the more complex one) and see if you have load on all servers?

On Monday, June 20, 2011 at 12:30 PM, Cédric Champeau wrote:

Update. I increased my number of parallel client threads to 8, and it
did not help : it's always the two same servers which execute queries.
Tell me if I'm wrong, but it seems that using the java api (NodeBuilder

  • client mode), there's no automatic round robin available. I have found
    in the list archives that it is recommended to query nodes
    alternatively, but there doesn't seem to be any way do this using a
    Client.prepareSearch() builder. I have worked around this using multiple
    cached clients (one per thread), but :
  • it seems difficult to rely on this on an application server
  • sometimes nodes are still unused (I guess it depends on the
    discovery order)

Le 17/06/2011 19:08, Cédric Champeau a écrit :

Hi,

I guess that's another noob question. I'm trying to figure out how to
improve search scalability, e.g. the size of my index won't grow much,
but the number of concurrent queries should increase (our customer
estimates 30 req/s at long term).

Right now, I have

  • 4 test machines, dual cores
  • 1 index
  • 2 shards, 1 replica

The shards are activated like this :
0 : server2, server3
1 : server1, server4

To test it, I run a client (on my development PC) which runs 2 threads
which concurrently send queries to the cluster. Each thread runs a
series of ~6000 queries. Some queries are short, some are longer to
execute (from 30ms to 800ms). Running this, I see that both cores of
server2 and server4 are fully used (100%), while server1 and server3
are unused. There doesn't seem to be queries sent to those servers.

At first, I thought it was my sharding and replica configuration which
was wrong (I had let the default configuration of 5 shards and 1
replica), because I didn't see any improvement in the number of
requests processed by second by the cluster over the test period by
adding nodes. That's why I switched to a simpler configuration with 2
shards and 1 replica, and figured out that two of the four nodes seem
to be unused.

Is this a sharding/replica configuration issue, or should I tune other
parameters ?

Thanks a lot.

Thanks for your answer. I create the client using the following code :

Node node = nodeBuilder.data(false).client(true).node(); // was :
nodeBuilder.data(false).node();
final Client client = node.client();

The resulting Client instance is a NodeClient. Then I send concurrently
(and repeatedly) the same query, which takes more than a second to
complete. I can only see load on two of the servers. Is it a problem
with the way I create the node ?

Le 21/06/2011 10:23, Shay Banon a écrit :

NodeClient will do round robin between shards and replicas, can you
try and execute the same query (the more complex one) and see if you
have load on all servers?

On Monday, June 20, 2011 at 12:30 PM, Cédric Champeau wrote:

Update. I increased my number of parallel client threads to 8, and it
did not help : it's always the two same servers which execute queries.
Tell me if I'm wrong, but it seems that using the java api (NodeBuilder

  • client mode), there's no automatic round robin available. I have found
    in the list archives that it is recommended to query nodes
    alternatively, but there doesn't seem to be any way do this using a
    Client.prepareSearch() builder. I have worked around this using multiple
    cached clients (one per thread), but :
  • it seems difficult to rely on this on an application server
  • sometimes nodes are still unused (I guess it depends on the
    discovery order)

Le 17/06/2011 19:08, Cédric Champeau a écrit :

Hi,

I guess that's another noob question. I'm trying to figure out how to
improve search scalability, e.g. the size of my index won't grow much,
but the number of concurrent queries should increase (our customer
estimates 30 req/s at long term).

Right now, I have

  • 4 test machines, dual cores
  • 1 index
  • 2 shards, 1 replica

The shards are activated like this :
0 : server2, server3
1 : server1, server4

To test it, I run a client (on my development PC) which runs 2 threads
which concurrently send queries to the cluster. Each thread runs a
series of ~6000 queries. Some queries are short, some are longer to
execute (from 30ms to 800ms). Running this, I see that both cores of
server2 and server4 are fully used (100%), while server1 and server3
are unused. There doesn't seem to be queries sent to those servers.

At first, I thought it was my sharding and replica configuration which
was wrong (I had let the default configuration of 5 shards and 1
replica), because I didn't see any improvement in the number of
requests processed by second by the cluster over the test period by
adding nodes. That's why I switched to a simpler configuration with 2
shards and 1 replica, and figured out that two of the four nodes seem
to be unused.

Is this a sharding/replica configuration issue, or should I tune other
parameters ?

Thanks a lot.

Can you enable logging on the discovery module on the client side? So you can verify that it manages to connect to all servers?

Also, if you switch to test using the TransportClient, then it will send the search request to a node, and that node will do the distributed search, see if it happens in this case as well...

On Tuesday, June 21, 2011 at 11:54 AM, Cédric Champeau wrote:

Thanks for your answer. I create the client using the following code :

Node node = nodeBuilder.data(false).client(true).node(); // was :
nodeBuilder.data(false).node();
final Client client = node.client();

The resulting Client instance is a NodeClient. Then I send concurrently
(and repeatedly) the same query, which takes more than a second to
complete. I can only see load on two of the servers. Is it a problem
with the way I create the node ?

Le 21/06/2011 10:23, Shay Banon a écrit :

NodeClient will do round robin between shards and replicas, can you
try and execute the same query (the more complex one) and see if you
have load on all servers?

On Monday, June 20, 2011 at 12:30 PM, Cédric Champeau wrote:

Update. I increased my number of parallel client threads to 8, and it
did not help : it's always the two same servers which execute queries.
Tell me if I'm wrong, but it seems that using the java api (NodeBuilder

  • client mode), there's no automatic round robin available. I have found
    in the list archives that it is recommended to query nodes
    alternatively, but there doesn't seem to be any way do this using a
    Client.prepareSearch() builder. I have worked around this using multiple
    cached clients (one per thread), but :
  • it seems difficult to rely on this on an application server
  • sometimes nodes are still unused (I guess it depends on the
    discovery order)

Le 17/06/2011 19:08, Cédric Champeau a écrit :

Hi,

I guess that's another noob question. I'm trying to figure out how to
improve search scalability, e.g. the size of my index won't grow much,
but the number of concurrent queries should increase (our customer
estimates 30 req/s at long term).

Right now, I have

  • 4 test machines, dual cores
  • 1 index
  • 2 shards, 1 replica

The shards are activated like this :
0 : server2, server3
1 : server1, server4

To test it, I run a client (on my development PC) which runs 2 threads
which concurrently send queries to the cluster. Each thread runs a
series of ~6000 queries. Some queries are short, some are longer to
execute (from 30ms to 800ms). Running this, I see that both cores of
server2 and server4 are fully used (100%), while server1 and server3
are unused. There doesn't seem to be queries sent to those servers.

At first, I thought it was my sharding and replica configuration which
was wrong (I had let the default configuration of 5 shards and 1
replica), because I didn't see any improvement in the number of
requests processed by second by the cluster over the test period by
adding nodes. That's why I switched to a simpler configuration with 2
shards and 1 replica, and figured out that two of the four nodes seem
to be unused.

Is this a sharding/replica configuration issue, or should I tune other
parameters ?

Thanks a lot.

Le 21/06/2011 11:54, Shay Banon a écrit :

Can you enable logging on the discovery module on the client side? So
you can verify that it manages to connect to all servers?

Can I do this programatically ? I don't have a logging.yaml file on the
client side.

Also, if you switch to test using the TransportClient, then it will
send the search request to a node, and that node will do the
distributed search, see if it happens in this case as well...
It works with a TransportClient : I do see load balancing. I have
launched multiple tests that I have difficulties to interpret. Here are
the results :

Conc. Th. Total time Req/s Avg./req
2 187740 5 396
3 146232 6 414
4 106218 8 447
5 109316 8 473
6 134498 7 656
7 141924 6 751
8 115698 8 962
9 139251 6 1165

In each test run, there's always the same number of queries being sent.
The first column represents the number of concurrent threads launching
queries on the client side. The second column represents the total
execution time of the test. The third one is the number of requests
processed per second and the last one is the average time took per query
(per thread : time measured is the execution time of a query in a single
thread, not the total execution time divided by the total number of
queries).

The cluster seems to be evenly used for 3 or 4 concurrent threads.
Starting from 5 threads, I can see that one node is fully loaded, while
the others are partly idle. With 8 concurrent threads, it seems there's
a "privileged couple" which runs most of the queries (each server has
one shard activated, so two servers are required to complete a query).
Looking at those results, the best configuration is 4 concurrent
threads, but it can only handle 8 requests/second and servers are not
fully used. With 9 concurrent threads, one of the servers was unused...
We can see that this load balancing problem makes the average response
time increase (last column).

I took a look at the TransportClientNodesService and it seems to use a
random generator to choose the first node to which send a query. The
"random" generator is just an AtomicInteger so maybe there's something
biased there.

On Tuesday, June 21, 2011 at 11:54 AM, Cédric Champeau wrote:

Thanks for your answer. I create the client using the following code :

Node node = nodeBuilder.data(false).client(true).node(); // was :
nodeBuilder.data(false).node();
final Client client = node.client();

The resulting Client instance is a NodeClient. Then I send concurrently
(and repeatedly) the same query, which takes more than a second to
complete. I can only see load on two of the servers. Is it a problem
with the way I create the node ?

Le 21/06/2011 10:23, Shay Banon a écrit :

NodeClient will do round robin between shards and replicas, can you
try and execute the same query (the more complex one) and see if you
have load on all servers?

On Monday, June 20, 2011 at 12:30 PM, Cédric Champeau wrote:

Update. I increased my number of parallel client threads to 8, and it
did not help : it's always the two same servers which execute queries.
Tell me if I'm wrong, but it seems that using the java api (NodeBuilder

  • client mode), there's no automatic round robin available. I have
    found
    in the list archives that it is recommended to query nodes
    alternatively, but there doesn't seem to be any way do this using a
    Client.prepareSearch() builder. I have worked around this using
    multiple
    cached clients (one per thread), but :
  • it seems difficult to rely on this on an application server
  • sometimes nodes are still unused (I guess it depends on the
    discovery order)

Le 17/06/2011 19:08, Cédric Champeau a écrit :

Hi,

I guess that's another noob question. I'm trying to figure out how to
improve search scalability, e.g. the size of my index won't grow much,
but the number of concurrent queries should increase (our customer
estimates 30 req/s at long term).

Right now, I have

  • 4 test machines, dual cores
  • 1 index
  • 2 shards, 1 replica

The shards are activated like this :
0 : server2, server3
1 : server1, server4

To test it, I run a client (on my development PC) which runs 2 threads
which concurrently send queries to the cluster. Each thread runs a
series of ~6000 queries. Some queries are short, some are longer to
execute (from 30ms to 800ms). Running this, I see that both cores of
server2 and server4 are fully used (100%), while server1 and server3
are unused. There doesn't seem to be queries sent to those servers.

At first, I thought it was my sharding and replica configuration which
was wrong (I had let the default configuration of 5 shards and 1
replica), because I didn't see any improvement in the number of
requests processed by second by the cluster over the test period by
adding nodes. That's why I switched to a simpler configuration with 2
shards and 1 replica, and figured out that two of the four nodes seem
to be unused.

Is this a sharding/replica configuration issue, or should I tune other
parameters ?

Thanks a lot.

The transport client will use a random generator and then just pick the server to hit in a round robin fashion. But then, once it hits that server, from the server there will be a distributed search execution that will go and hit the shards (again, in a round robin fashion).

On Tuesday, June 21, 2011 at 2:52 PM, Cédric Champeau wrote:

Le 21/06/2011 11:54, Shay Banon a écrit :

Can you enable logging on the discovery module on the client side? So you can verify that it manages to connect to all servers?

Can I do this programatically ? I don't have a logging.yaml file on the client side.

Also, if you switch to test using the TransportClient, then it will send the search request to a node, and that node will do the distributed search, see if it happens in this case as well... It works with a TransportClient : I do see load balancing. I have launched multiple tests that I have difficulties to interpret. Here are the results :

Conc. Th.
Total time
Req/s
Avg./req

2
187740
5
396

3
146232
6
414

4
106218
8
447

5
109316
8
473

6
134498
7
656

7
141924
6
751

8
115698
8
962

9
139251
6
1165

In each test run, there's always the same number of queries being sent. The first column represents the number of concurrent threads launching queries on the client side. The second column represents the total execution time of the test. The third one is the number of requests processed per second and the last one is the average time took per query (per thread : time measured is the execution time of a query in a single thread, not the total execution time divided by the total number of queries).

The cluster seems to be evenly used for 3 or 4 concurrent threads. Starting from 5 threads, I can see that one node is fully loaded, while the others are partly idle. With 8 concurrent threads, it seems there's a "privileged couple" which runs most of the queries (each server has one shard activated, so two servers are required to complete a query). Looking at those results, the best configuration is 4 concurrent threads, but it can only handle 8 requests/second and servers are not fully used. With 9 concurrent threads, one of the servers was unused... We can see that this load balancing problem makes the average response time increase (last column).

I took a look at the TransportClientNodesService and it seems to use a random generator to choose the first node to which send a query. The "random" generator is just an AtomicInteger so maybe there's something biased there.

On Tuesday, June 21, 2011 at 11:54 AM, Cédric Champeau wrote:

Thanks for your answer. I create the client using the following code :

Node node = nodeBuilder.data(false).client(true).node(); // was :
nodeBuilder.data(false).node();
final Client client = node.client();

The resulting Client instance is a NodeClient. Then I send concurrently
(and repeatedly) the same query, which takes more than a second to
complete. I can only see load on two of the servers. Is it a problem
with the way I create the node ?

Le 21/06/2011 10:23, Shay Banon a écrit :

NodeClient will do round robin between shards and replicas, can you
try and execute the same query (the more complex one) and see if you
have load on all servers?

On Monday, June 20, 2011 at 12:30 PM, Cédric Champeau wrote:

Update. I increased my number of parallel client threads to 8, and it
did not help : it's always the two same servers which execute queries.
Tell me if I'm wrong, but it seems that using the java api (NodeBuilder

  • client mode), there's no automatic round robin available. I have found
    in the list archives that it is recommended to query nodes
    alternatively, but there doesn't seem to be any way do this using a
    Client.prepareSearch() builder. I have worked around this using multiple
    cached clients (one per thread), but :
  • it seems difficult to rely on this on an application server
  • sometimes nodes are still unused (I guess it depends on the
    discovery order)

Le 17/06/2011 19:08, Cédric Champeau a écrit :

Hi,

I guess that's another noob question. I'm trying to figure out how to
improve search scalability, e.g. the size of my index won't grow much,
but the number of concurrent queries should increase (our customer
estimates 30 req/s at long term).

Right now, I have

  • 4 test machines, dual cores
  • 1 index
  • 2 shards, 1 replica

The shards are activated like this :
0 : server2, server3
1 : server1, server4

To test it, I run a client (on my development PC) which runs 2 threads
which concurrently send queries to the cluster. Each thread runs a
series of ~6000 queries. Some queries are short, some are longer to
execute (from 30ms to 800ms). Running this, I see that both cores of
server2 and server4 are fully used (100%), while server1 and server3
are unused. There doesn't seem to be queries sent to those servers.

At first, I thought it was my sharding and replica configuration which
was wrong (I had let the default configuration of 5 shards and 1
replica), because I didn't see any improvement in the number of
requests processed by second by the cluster over the test period by
adding nodes. That's why I switched to a simpler configuration with 2
shards and 1 replica, and figured out that two of the four nodes seem
to be unused.

Is this a sharding/replica configuration issue, or should I tune other
parameters ?

Thanks a lot.