How to find out why a document get is slow?

Hi,

After upgrading from 2.4 to 5.1.2, we can see a tremendous increase of response times in simple document gets (from java's get API over transport protocol).
Previously -with 2.4-, the response times were in the range of few millisecs (1-5), now there are responses above 1-2 seconds, but only a few. The median seems to be OK.

The slow query doc says that there is no slow query logging for exact gets, just indexing and searches.
So what's the preferred method of finding out why these get queries are slow?

Hey,

if you run the GET API, then no query is issued, so no slow query entry is being created. You can use the hot threads API to check what causes high CPU usage.

Also nodes stats APIs might be worth taking a look (and comparing against your old system).

--Alex

Hi,

Hot_threads look like this:
http://pastebin.com/sntJza2s
node stats:
http://pastebin.com/gFfGTUXr

Sadly, the upgrade was in-place, so I have no textual output for the old system.

Thanks,

Hey,

due to the amount of data I had only a quick look. You are spending a lot of time in warmers. to create parent child data structures. Why is that? Could you use eager_global_ordinals instead?

--Alex

Can I narrow this to at least an index, or mapping? We don't use parent child explicitly (I mean I don't know whether elasticsearch uses it internally somehow), although we've had a mapping where it was defined, but left unused.
So I can't answer this question. It would be good to see in what index/mapping does this happen.

Also, in ES2.4 we couldn't see this behaviour (I don't have a hot_threads dump, but the response times were quick). It started to appear with the upgrade to 5.1.2. Is there something, which has changed considerably in this area?

Hey,

wait, something is weird here. You said you are using Elasticsearch 5.1.2, but warmers were removed in 5.1. Are you sure you dont have another version running... I'm confused now :slight_smile:

--Alex

I have also found that strange, when I've checked the docs which said they were deprecated in 2.3.0.
Here's what I have:

curl -s 'http://localhost:9200/_cat/nodes?v&h=v,id'

v id
5.1.2 9YnN
5.1.2 wzS5
5.1.2 mTac
5.1.2 -_zl
5.1.2 uZ2l
5.1.2 Zhv2
5.1.2 ebJt
5.1.2 HEI1
5.1.2 sPrn
5.1.2 Gwgn
5.1.2 USXH
5.1.2 kxzX
5.1.2 Cksl
5.1.2 H3w1
5.1.2 XdeO
5.1.2 q8Wx
5.1.2 1qbk
5.1.2 X_QX
5.1.2 GauH
5.1.2 43Gj
5.1.2 1T6O
5.1.2 Wyje
5.1.2 b6L9
5.1.2 HC5i
5.1.2 g6XG
5.1.2 BojW

BTW, the indexes were created in 2.x.

That looks good. You also said that you dont use parent-child. Elasticsearch does not use this unless it is configured explicitely. You can use the _mapping endpoint to get the mapping of all indices.

I have a mapping defined with parent-child, but no documents were inserted with that.
This is still in the index:

  "messages_29" : {
    "mappings" : {
      "message_search_helper" : {
        "dynamic" : "strict",
        "include_in_all" : false,
        "_parent" : {
          "type" : "message"
        },
        "_routing" : {
          "required" : true
        },
        "properties" : {
          "searchId" : {
            "type" : "long"
          }
        }
      },

But no docs are stored with it:

curl -s 'http://localhost:9200/messages_29/message_search_helper/_count' 
{"count":0,"_shards":{"total":1,"successful":1,"failed":0}}

BTW, I'm also confused. If there are no warmers, why do I see those in the hot_threads output?

Hey,

so there are still warmers for some tasks, like the parent/child handling. Can you delete that parent child index (it seems to be empty anyway) and see if those warmers and their CPU usage go away (just a guess here)?

Also, is there constant CPU usage by the warmers? Like all the time? Have you run a couple of more hot threads requests?

Deeply wondering how any warmers can consume CPU, but you dont have any indices with parent child, that contain data

--Alex

Hi,

Sadly, we have other, valuable data in those indexes. But we will reindex those to see what changes.
BTW, the same was the case with ES2.4 and there the performance was OK. Do you know about any changes which could cause this?

I will report back with the findings, when the indexes got reindexed.

By reindexing all indexes, things looks better. The warmers' high CPU usage has disappeared, now hot_threads show only some merge tasks, or snapshot-related stuff (backups possibly).
Also, after the reindex, extremely long get responses are gone, now I can see this for today:

       N           Min           Max        Median           Avg        Stddev
x 89971255      0.255709     3056.4011      1.165522     2.5856993     17.446189

So it seems that parent-child mapping, which never had any doc caused some of the troubles.
Thanks,

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.