`refresh=wait_for` taking unexpectedly long

I've started to use wait_for after we upgraded to 5.3 (from 1.5) for user-facing parts of the application which require that the next search returns the modified results.

The index refresh_interval is set to 100ms . When I index e.g. 10 documents (in a batch request) and use different values for refresh=, this my current outcome

  • false (or leaving out): 0.1s (basically: it's indexed async)
  • wait_for: 4s
  • true: 1.4s

I also tried to increase refresh_interval to 1s and use e.g. refresh=true but the timing didn't change.

It's unexpected to me that refresh=wait_for would take 4s here. 1.4s also isn't fast but that's an acceptable wait time for us currently.

Also, this are numbers from production from a write-heavy application. That's why I also never really saw this on dev environments which lack write-heavy nature.

I'm using elasticsearch-php client library and specifically all index operations happen via \Elasticsearch\Client::bulk:

  • the body contains all the documents to be sent
  • and refresh contains one of the values above

I thought that I only have to wait_for the refresh_interval of 100ms but obviously that assumption is flawed.

How does this work and what can I expect here? Can I improve refresh=wait_for performance without falling back to refresh=true?

thanks!

The translog is still fsynced unless you turned that off. But, yeah, the search data structures aren't built synchronously with your request.

Right. You have to wait for the actual refresh to finish. They might be scheduled to be kicked off every 100ms but it has to wait for a thread on the refresh pool (see here for useful stuff like the size of the pool). And then the refresh has to actually finish. My guess, given that refresh=true finishes must faster than refresh=wait_for's time + refresh_interval is that the refresh pool is stuffed. refresh=true uses the indexing thread pool for the refresh because it is synchronous with the indexing call, I believe. Can you have a look at the nodes stats and check to see if that thread pool has a queue?

My SWAG is that you should raise the refresh interval on indices that you don't use with refresh=wait_for. You could raise the size of the thread pool but I'm not sure what side effect that would have.

Anything specific I should look out or? I see the refresh and index as well as the bulk pools:

{
  "thread_pool": {
    "bulk": {
      "threads": 10,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 10,
      "completed": 29826938
    },
    "fetch_shard_started": {
      "threads": 0,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 0,
      "completed": 0
    },
    "fetch_shard_store": {
      "threads": 0,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 0,
      "completed": 0
    },
    "flush": {
      "threads": 1,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 5,
      "completed": 453
    },
    "force_merge": {
      "threads": 1,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 1,
      "completed": 13
    },
    "generic": {
      "threads": 4,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 4,
      "completed": 726806
    },
    "get": {
      "threads": 10,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 10,
      "completed": 304772
    },
    "index": {
      "threads": 10,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 10,
      "completed": 15074
    },
    "listener": {
      "threads": 5,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 5,
      "completed": 147613
    },
    "management": {
      "threads": 5,
      "queue": 0,
      "active": 1,
      "rejected": 0,
      "largest": 5,
      "completed": 6501816
    },
    "refresh": {
      "threads": 5,
      "queue": 0,
      "active": 1,
      "rejected": 0,
      "largest": 5,
      "completed": 30810420
    },
    "search": {
      "threads": 16,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 16,
      "completed": 30804051
    },
    "snapshot": {
      "threads": 0,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 0,
      "completed": 0
    },
    "warmer": {
      "threads": 5,
      "queue": 0,
      "active": 2,
      "rejected": 0,
      "largest": 5,
      "completed": 63761738
    },
    "watcher": {
      "threads": 0,
      "queue": 0,
      "active": 0,
      "rejected": 0,
      "largest": 0,
      "completed": 0
    }
  }
}

This snapshot is from right now and I guess there isn't much going on.

Would there be a benefit not using _bulk all the time?

The implementation can always index multiple documents (of the same type, in my case) at once and indexing a single document isn't a special case => just a _bulk with one document. However: the majority of indexing requests deal with only a single document.

Looking at the node stats, this is easily reflected in the completed of bulk vs. index pool. TBH I'm surprised there are even ~15k index.completed because there's no code we've which performs such operations but maybe that's just something else (Kibana?).
OTOH, when I read up your link about the pools, index and bulk have the same default configuration characteristics so this may be a dead end anyway.

Thanks, I actually did that already in the meantime. Since yesterday I completely removed all refresh=wait_for and bumped the refresh_interval to 1s.

In certain areas, user interaction do now directly cause index requests with refresh=true. Since users are hard to control, this could be anything from 0 to 100 concurrently.

I expected this to to not go smooth, but I couldn't detect any real problems so far. CPU load is still stable (but more fluctuating; change was deployed around 14pm in that chart):

Should I be more careful with too many concurrent index operations performing refresh=true?

Thank you very much so far!

Looks like just about nothing. 1 active refresh thread and 2 active warmer threads.

One curious thing is that you have 30 million complete search tasks, 30 million complete refresh tasks, and 63 million complete warmer tasks. I'm honestly not sure if that is normal or not! Probably is. Can you get the index stats for warmers on the indices you are doing this wait_for_refresh for? I wonder how long they are taking.

It looks like warmers usually don't do a lot of work but if you are using nested documents then they will.

Just that _bulk and _index and separate thread pools with separate queues is all. I had an application that only used _bulk even if it had just one thing to update. I bumped the size of the bulk thread pool and everything was fine.

0-100 is probably not too big a deal. One issue you may have is that you'll end up with very small segments because of the forced refreshes so you'll see more load spent on merging. It might not matter. refresh=wait_for won't change the expected segment size.

Here they are; TBH not sure what this numbers should tell me (I mean, yes, the individual numbers are clear but I fail to see a conclusion here :expressionless: ):

        "warmer": {
          "current": 1,
          "total": 1839115,
          "total_time_in_millis": 1062636977
        },

Hmm, yes I'm using nested on two types. This nesting however is basically always like 1:1 (it doesn't contain multiple nested docs, at best always exactly one; at worst none).

Also I'm heavily using Parent/Child documents. This is absolutely required because we're storing posts and comments and we used to store comments directly inside posts but this wasn't really feasible with 100k comment threads ... :slight_smile:

Currently I index with refresh=true directly (after the changes mentioned in the first post). I'm wondering the following:

  • using this approach, I can hardly control how many refresh=true happen concurrently because the indexing has to be performed, no matter
  • what if I separate _bulk from a manual /_refresh call
  • and come up with a mechanism so it doesn't get executed e.g. more than once in 100ms

As I understand it, no matter how many index operations I sent, when I sent /_refresh all of them will be made available for search so a manual but throttled /_refresh call could mean much more less stress on the system?

But TBH at the moment there's no identifiable bottlebeck and this may just be premature anyway.

Anyway, the only way to make an improvement from here on would be to find something which would reduce the ~1.4s when batch indexing 10 docs.

Thanks for your detailed responses (here and in the other thread), I learned so much from this!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.