I'm using Elastic Cloud (ES 5.4.2) and the Python client. We currently have about ~30M documents in our 1 node cluster across 17 indices and 81 shards. No memory pressure issues. I've been writing a Django front-end for our application. Everything was going great until I took a break and returned to find that the first search for each page load is taking about 30 seconds to complete. Subsequent searches on the same page load perform normally, including searches on different indices.
This is true even if the first search is extremely simple. In fact, it's true even if the first search has a syntax error! I can replicate the behavior in a test script running outside the Django environment:
# create connection
es = create_es_connection(args.credentials, args.environment)
print "start", datetime.now()
es.get(index='index-name', doc_type='type-name', id=1221)
print "1221", datetime.now()
es.get(index='index-name', doc_type='type-name', id=1222)
print "1222", datetime.now()
es = create_es_connection(args.credentials, args.environment)
print "start", datetime.now()
es.get(index='index-name', doc_type='type-name', id=1221)
print "1221", datetime.now()
es.get(index='index-name', doc_type='type-name', id=1222)
print "1222", datetime.now()
In each case the first search takes 25-35 seconds. The second search is nearly instantaneous, as expected. Additional searches, more complex searches, searches of other indices, and waiting several seconds between searches all produce identical results. It's always the first search that is the problem. Again, this is true even if the first search actually produces a syntax error.
What might have caused this behavior and how can I prevent it?