How can a document get take 38 seconds?

Hi,

We have an application which uses 5.1.2 java client with transport protocol and a 5.1.2 cluster. The app measures elastic response times and some get operations return in the order of tens of seconds.

An example shows 38036.18004 ms response time for this index for a simple document get:

curl -s 'http://localhost:9200/_cat/shards?v' | grep 'accounts_2 '
accounts_2       0     r      STARTED  2366203 745.4mb 10.6.145.198 fe05
accounts_2       0     p      STARTED  2366203 738.2mb 10.6.145.195 fe02
accounts_2       0     r      STARTED  2366203 740.4mb 10.6.145.197 fe04

The segments look like this (the index was created in 2.4 and later upgraded to 5.1):

curl -s 'http://localhost:9200/_cat/segments?v' | egrep '^index|accounts_2 '

http://pastebin.com/se7HwHud

I wonder: how can a document get last 38 seconds? At that time, the cluster was green.
There are other queries running while this happens and the statistical breakdown looks OK:

      N           Min           Max        Median           Avg        Stddev
x 2574735      0.328462      38036.18      1.101684     8.7738861     126.71908

Meaning from nearly 2.6M doc gets the slowest was 38 seconds, but the median shows a healthy 1.1 ms, so there are only some (but a constant amount of) slow queries.

While running on ES2.4, there weren't any such slow queries. The average was nearly the same as the median.

This is the same as How to find out why a document get is slow?, so feel free to add more to that thread if required, but creating more threads makes it harder to help :slight_smile: