Many slow queries and high cpu load on load test (50req/sec)

Hello everyone,
I have some problems running elasticsearch on production.

but that did not actually help me

Before deploying the application into production, we want to run load tests on beta machine so that we want to be sure.

Beta Machine Hardware Specs is as follows:

CPU vendor: Intel
CPU model: Xeon (2001 MHz)
CPU total logical cores: 24
CPU cache: 15kb

VM name: Java HotSpot(TM) 64-Bit Server VM
VM vendor: Oracle Corporation
VM version: 25.31-b07
Java version: 1.8.0_31

Index Settings and Mappings:

Settings: http://pastebin.com/Juin26k8

Mappings: http://pastebin.com/W3CqtiFU

Sample Query

http://pastebin.com/W5PgRVf7

After running these queries with similar terms, around 50/60 request per second, suddenly search requests get slower like from 60ms to 2-3 secs.

One of the lines from slow_search.log

[metadatav3][1] took[1.9s], took_millis[1996], types[track], stats[], search_type[DFS_QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":20,"timeout":5000,"query":{"function_score":{"query":{"bool":{"must":{"match":{"fullString":{"query":"Charlie Feathers","type":"boolean","operator":"OR","minimum_should_match":"6","cutoff_frequency":0.01}}},"must_not":{"term":{"isHidden":"true"}},"should":[{"match":{"fullString.token":{"query":"Charlie Feathers","type":"phrase"}}},{"match":{"fullString.edgeNGNoSplit":{"query":"Charlie Feathers","type":"phrase"}}}]}},"field_value_factor":{"field":"popularity"}}},"explain":false}], extra_source[],

My main problems that I do not fully understand:

1. We are going to have two nodes but the es configuration has 5 shard. We are leaving that with 5 shard because we might put additional nodes in future. 
Might shard configuration cause this cpu load?

2. Analyzers and query does not seem to be complex but I am not really an expert.
Might query and analyzer settings cause this cpu load?

Also I have founded that thread:
https://groups.google.com/forum/#!searchin/elasticsearch/load$20test/elasticsearch/8wZgMdEzHvk/i0PxqRNYYm4J

Maybe have a look at the hot threads before and after the slow down to try to understand what changes.

Hey adrien,
Here is the hot thread dump

also if you want I can send a copy of thread dump.

Might shard configuration cause this cpu load?

In this case this is likely not the shard configuration causing the cpu load.

Might query and analyzer settings cause this cpu load?

It's possible, what would be a good way to test this out is to separate out your query into its separate parts, for example, separate out the field_value_factor into a separate query, the fullString query into a separate query, the fullString.token into a separate one, etc.

Then you can run each query by itself and see which is taking the bulk of the time.

The hot threads look sane. When the throughput suddenly decreases, is your machine still fully utilized (CPU and/or I/O at 100%) or does it look idle? Can you correlate the slowdown with garbage collection activity?

I will try these queries separately and share the results

what do you mean throughput decreases? Do you mean that responses start to last longer? If yes, yeah CPU stays utilized during the load test when I check with "top" command I can see that %CPU is around 2200-2300.
If you want I can get a screenshot from marvel plugin also during load test for garbage collector monitoring.

That could be helpful, thanks.

Hello Adrien, I am going to add query.json, marvel screenshot during a load test and a detailed gatling output about the load test.

I do not know that you are familiar with gatling tool but there is an index.html under the folder(a2-1446748547160), you can view the details from there if you would like.
I will be online for next 4-5 hours you can ask anything or want another load test with different parameters:

Hey Adrien, do you need any additional info?
Also I might found the reason about the really high load average?
I mean search.threadppol.type was cached. I guess this type is unbounded, this is why it generates almost 2K threads, I thought context-swtich might occupy cpu instead of search operation.

However, now, we have EsRejectedExecutionException, probably due to the fixed threadpool and queue size.

I am still confused, does that mean that I need new machines if I still want to ngram?
Mehmet -

Wow, indeed cached is not a good type for the search threadpool, and 2k threads can easily cause issues with your scheduler. We recently changed thread pool types to not be updateable to avoid such problems: https://github.com/elastic/elasticsearch/pull/14367

The EsRejectedExecutionException tells you that you are sending more requests than Elasticsearch can handle: it means that not only all threads of the search thread pool are busy, but also that the queue is full. When this happens, you need to stop sending requests for some time so that some space is going to be made in the queue, and then start again. The same mechanism is used at index time when indexing too many documents concurrently.

so what is you suggestion about ngrams?
I am shocked actually and 2 possible ideads came up:

  1. everyone using ngrams adds machine to support huge loads.
  2. nobody cares more thatn 50rps
  3. I am really missing a point?

Have you checked gc graphs?
I saw some gc young activity during these loads is that normal?