Refresh is very slow

cyberhuman · May 22, 2018, 1:18am

Hello,

I have an issue with refreshes in an ES cluster.

I have a process that does bulk indexing, and few other processes that do multiget requests. And there is an API that bulk writes some documents to ES and must ensure that data is searchable before it returns. The calls to the API are not very frequent, but it's quite important that they complete as soon as possible. To achieve the requirement that the data is searchable after the API call returns I tried various approaches, none of which worked well.

adding refresh=true to bulk requests: the requests always time out if there is another indexing going on
refreshing using the refresh API: refresh always times out (I even tried to refresh with curl and waited for up to an hour - the request would never complete!)
doing a search using the ids query and making sure the newly written data is read (in a hope that automatic refresh helps) - it takes ridiculously long to get the fresh data back (more than 30s with refresh_interval set to 1s).

The refresh_interval is set to 1s (initially it was 5s, but switched to 1s to test out approach #3; it really makes no difference in the result). There are 1000 shards in the index spread over 170 nodes. As far as I can see, no process is CPU or disk-bound.

How do I find out what's happening? What could block forced refreshes and why refresh_interval is not tolerated?

Thanks,
Raman

warkolm · May 22, 2018, 2:28am

What version are you on?
What sort of hardware?

Why so many shards?

Are you using the free Monitoring functionality to correlate refreshes with other measurements?

cyberhuman · May 22, 2018, 4:37am

It is ES 6.2.1
The hardware is quite good:
Intel Xeon Silver 4114 CPU (40 cores @ 2.2GHz)
3x 2TB Intel SSD
128GB RAM (Java heap is 30GB)
10Gbit network

There are so many shards because the amount data stored is quite large: 6.5 billion documents occupying around 67TB of disk space.

We don't have monitoring deployed. What kind of measurement would be interesting to have?

cyberhuman · May 22, 2018, 4:39am

Is there anything I could check right away?
For example, when I issue a refresh request it just blocks forever. Is there a way to check what is going on and why it is blocked?

Christian_Dahlqvist · May 22, 2018, 5:11am

Are you indexing new documents or updating existing ones? Are you using nested documents or parent-child relationships?

cyberhuman · May 22, 2018, 6:15am

It's mostly about updating the existing documents, but sometimes new ones are created.
The documents that are updated or created by API have nested documents within them. But the number of documents with nested documents is quite low: there are only around 30M of them and less than around 8 such documents are indexed per second in total, and only one document with nesting is indexed by the API in around 2-3 seconds.

cyberhuman · May 22, 2018, 6:25am

Here is how a typical log exhibiting the issue looks like:

[2018-05-22T05:25:37.8540] bulk_result elapsed 0.0299 secs stats "create": 1
[2018-05-22T05:25:37.8541] waiting for 1 documents to refresh
[2018-05-22T05:25:39.2726] waiting for 1 documents to refresh
[2018-05-22T05:25:40.4803] waiting for 1 documents to refresh
[2018-05-22T05:25:41.8195] waiting for 1 documents to refresh
[2018-05-22T05:25:43.0987] waiting for 1 documents to refresh
[2018-05-22T05:25:44.4356] waiting for 1 documents to refresh
[2018-05-22T05:25:45.7103] waiting for 1 documents to refresh
[2018-05-22T05:25:46.9689] waiting for 1 documents to refresh
[2018-05-22T05:25:47.8549] wait for refresh timed out after 10 secs

system · June 19, 2018, 6:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
After bulk indexing with refresh_interval disabled, now at 100% CPU usage for > 24 hours Elasticsearch	3	1084	July 6, 2017
Elasticsearch bulk slows down after a certain amount of documents Elasticsearch	4	1365	April 24, 2020
ElasticSearch - Refresh issue ? Too many Requests ? Can't find documents randomly Elasticsearch	17	3031	June 14, 2021
Cluster extremely slow after many bulk indexes Elasticsearch	2	495	July 6, 2017
_refresh taking 30+ seconds Elasticsearch	1	469	July 5, 2017

Refresh is very slow

Related topics