Significant time changes during index benchmarking

I have set up a benchmarking cluster is another AWS region and I'm getting some really weird results. I have bulk loaded content to the index from the file output of stream2es. The node setup I started with was 1 data node with 30g of heap assigned to ES. There are many parent/child relationships within this index which is one of the reasons we are trying to get information about search times. We are trying to figure out if there is a cluster setup that would be optimal for this particular index "shape".

I have a list of 151 searches that I am running through the system to get a baseline average on how long the searches are taking. I'm running this set of searches a total of 100 times in hope to prime the cache and get more accurate times. After the initial bulk load the results returned are really good, numbers we want. On average we are getting 200ms returns. So my next step was to decrease the heap size because we are doing very little full-text searching and more parent/child relationships, in hopes that the file cache would be utilized better. The results from this test is terrible.

So here is the meat...

If I turn the system back to 30g again and run the exact same set of searches I get on average 2s return times. I assume there is something "priming" in the bulk load that we aren't doing or forcing in the previous changes and wondered if there is something I'm missing. How can we check to see what the difference is between these two states when from a configuration standpoint there are the same?

what is the RAM available in that machine? Also, do u have indexing happening when u do search OR the system is idle in that case?

I have a total of 60 GB on the machine 30 GB of that associated with ES_HEAP and there is no indexing going on during my benchmarking, that was going to be a next test once I figured out these numbers. Now to be more open about our numbers, the average of 2 sec per search is what we are seeing in production but when we run after a bulk import in testing environment we see the 200ms return as an anomaly, that is what we are trying to figure out. Why we get such good results, results we want in production, after the bulk import.

:). we also ran into similar issue OR I shud say have same issue.. what we noticed is the search request rate in PROD (~0.02 q/s) is too lesser than what we have in test environment (~5 q/s)