Increasing ES search throughput

I have 15 node cluster with 14 data nodes and 1 master node. We are
indexing about 300M documents with 7 shards and 1 replica. Each node has
24G of RAM . Right now I am using mmapfs for index store and ES is using
only 3Gb RAM ( out of 6GB assigned ) .

I am using simple text queries for search and we get an average response
time of 200ms . However our throughput is limited to about 100QPS @20
concurrent threads. Increasing number of threads doesn't seem to help
afterwards as CPU load becomes 100%.

Any suggestions on how to improve the throughtput ?

--

You can increase the number of replicas and see if that helps.

The other option I see is to use routing to direct the query to the shard
that contains your results. Here's a video about doing that and some more:

On Wednesday, August 15, 2012 10:04:51 PM UTC+3, sumit wrote:

I have 15 node cluster with 14 data nodes and 1 master node. We are
indexing about 300M documents with 7 shards and 1 replica. Each node has
24G of RAM . Right now I am using mmapfs for index store and ES is using
only 3Gb RAM ( out of 6GB assigned ) .

I am using simple text queries for search and we get an average response
time of 200ms . However our throughput is limited to about 100QPS @20
concurrent threads. Increasing number of threads doesn't seem to help
afterwards as CPU load becomes 100%.

Any suggestions on how to improve the throughtput ?

--

Thanks for the reply . I tried adding more replicas , it does help little
bit ( from 100 to 125 QPS ) but my CPU usage goes up really high.

I dont know how we can implement routing since we are using analyzer on the
String field and searching on them. I was under the impression that I cant
use routing for analyzed fields .

We did manage to reduce GC cycles by increasing the new object heap memory
size.

On Wednesday, August 15, 2012 9:21:12 PM UTC-7, Radu Gheorghe wrote:

You can increase the number of replicas and see if that helps.

The other option I see is to use routing to direct the query to the shard
that contains your results. Here's a video about doing that and some more:

Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wednesday, August 15, 2012 10:04:51 PM UTC+3, sumit wrote:

I have 15 node cluster with 14 data nodes and 1 master node. We are
indexing about 300M documents with 7 shards and 1 replica. Each node has
24G of RAM . Right now I am using mmapfs for index store and ES is using
only 3Gb RAM ( out of 6GB assigned ) .

I am using simple text queries for search and we get an average response
time of 200ms . However our throughput is limited to about 100QPS @20
concurrent threads. Increasing number of threads doesn't seem to help
afterwards as CPU load becomes 100%.

Any suggestions on how to improve the throughtput ?

--

On Thursday, August 16, 2012 8:24:53 PM UTC+3, sumit wrote:

Thanks for the reply . I tried adding more replicas , it does help little
bit ( from 100 to 125 QPS ) but my CPU usage goes up really high.

I would try with less shards and more replicas, if reindexing is an option
and inserting isn't very heavy. And I'd run this sort performance testing
on a separate environment.

Also, I would optimize the index regularly.

I dont know how we can implement routing since we are using analyzer on
the String field and searching on them. I was under the impression that I
cant use routing for analyzed fields .

As far as I understand, the routing value itself is not analyzed, which is
different than how you map your "data" fields. But I might be wrong, so I
suggest to try with some sample data and see if you can make it work.

We did manage to reduce GC cycles by increasing the new object heap memory
size.

On Wednesday, August 15, 2012 9:21:12 PM UTC-7, Radu Gheorghe wrote:

You can increase the number of replicas and see if that helps.

The other option I see is to use routing to direct the query to the shard
that contains your results. Here's a video about doing that and some more:

Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wednesday, August 15, 2012 10:04:51 PM UTC+3, sumit wrote:

I have 15 node cluster with 14 data nodes and 1 master node. We are
indexing about 300M documents with 7 shards and 1 replica. Each node has
24G of RAM . Right now I am using mmapfs for index store and ES is using
only 3Gb RAM ( out of 6GB assigned ) .

I am using simple text queries for search and we get an average response
time of 200ms . However our throughput is limited to about 100QPS @20
concurrent threads. Increasing number of threads doesn't seem to help
afterwards as CPU load becomes 100%.

Any suggestions on how to improve the throughtput ?

--