The number of clients in search operation

Some inline comments below. Note that without seeing the entire Rally report it's not easy to understand the whole picture.
Esp. for service_time vs latency, the FAQ mentioned earlier is an essential read to understand the differences.

1.When I initially configured target-throughput:20 and clients:1, I have received the following report
max-throughput: 18.6 99th percentile latency: 2391 99th percentile service time : 97

max-thoughtput not achieving the set target-throughput and the very high value of 99th percentile latency vs 99th percentile service time indicates that the benchmark is unstable and the numbers are not very representative. Despite of this, I have a few additional, more detailed comments:

With clients:1 you need to remember what I mentioned earlier, i.e. request-response is blocking.

So if your cluster is slow executing the queries, you can't achieve the configured target-throughput and this is what happened here, the max throughput achieved was 18.6ops/s < 20 target-throughput.
Median Throughput would also be useful to check.

The 99th percentile service time (service_time is time taken between issuing request and receiving response) metric tells us that 99% of requests took <=97ms to return.
It's good to keep in mind that interpretation of those numbers is related to the iterations; e.g. if you had 500 iterations, 99th percentile means that 5 requests ended up having >97ms service time. It would be useful to see the other percentiles here e.g. 50th to see how slow a larger % of operations were.

For latency it's best to read the great analogy with the Barista and Coffee shop, mentioned by @danielmitterdorfer here
TL;DR If requests are constantly taking longer to execute than the calculated schedule, this piles up and can lead to high latency values [1].

2.When I initially configured target-throughput:100 and clients:10, I have received the following report
max-throughput: 99 99th percentile latency: 395 99th percentile service time : 362

Here you have more clients to issue parallel requests despite the request-response delay and you can see that increasing your clients and target-throughput also increased the service time more than 4x to 362ms.

This shows that, the more queries you send, the slower your cluster is able to service the queries.
Note that issuing search requests is a lightweight operation for Rally (there isn't much expensive IO in the background like with bulk for example).

From test 2, we can find the cluster can reach 100 throughput; i wonder why 99th percentile latency in test 1 is so large(2391ms) even though the target-throughput is below 100 ?

In my case, I think the throughput of my cluster is relatively high, but from the test results, i find the throughput is low. So I wonder if the load generated by rally is too low; I want to know the maximum number of clients in the search operation, and the maximum request per second per client.

The number of clients depends on what you want to achieve; e.g. if you are benchmarking a 300ms SLA for your query responses under a load of 100 queries/s, you'll need at least 30 clients. Rally will schedule load among its clients to achieve the target-throughput and not more.

I would also recommend that you enable an Elasticsearch metrics store so that you can have better visibility into each request. For each search operation-type, there will be metric records for latency, service_time and throughput to help you understand better what's happening over time than just the summary at the end.

[1] With clients:1 and target-throughput:20, Rally will ensure that requests are only issued every 1/20=50ms. If a large amount of operations takes >50ms to return, this will pile up and influence the latency_time.