Queries get slow while indexing documents

taichi · October 7, 2020, 9:13am

I have an index with approximately 2.3M documents.

It usually returns results of query within 1~2 seconds, but it gets very slow (sometimes as long as 20~30s) while I'm inserting data into the index.

I insert data in RDB into the index in a batch process using Bulk API (about 100 documents per second).

Why does this happen?
I could not find any documentations about the behavior.

dadoonet · October 7, 2020, 9:28am

What kind of query are you running? Could you share an example?
What is the response? Could you share are least the first 20 lines?

Also what kind of hardware do you have? Heap size, SSD vs HDD...

taichi · October 8, 2020, 1:59am

Counting all documents ( like _count {"query": "match_all": {}} ).
Other search queries also sometimes get slow.

I cannot share the response, but response size is small.

I'm using m5.xlarge.elasticsearch * 2 (4 vCPU, 16 GB RAM, 512 GB SSD) for data nodes.

dadoonet · October 8, 2020, 6:15am

Why? There's nothing secret in the first lines.

dadoonet · October 8, 2020, 6:17am

BTW a match_all taking normally some seconds is a way too slow.

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

Christian_Dahlqvist · October 8, 2020, 6:30am

Indexing can be both CPU and dusk I/O intensive. I do not know what monitoring you have access to but it would be good to try to identify if CPU or dusk I/O is limiting performance. If you are using gp2 EBS it gets IOPS proportional to size and since large scroll queries results in a lot of disk I/O so that is probably what I would start with.

taichi · October 8, 2020, 6:31am

Sorry, search response is like this (just one of examples):

{
    "took": 6554,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": 6.086506,
        "hits": [...]
    }
}

count response is:

{
    "count": 2300000,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    }
}

taichi · October 8, 2020, 6:37am

I created gist for this.

gist.github.com

https://gist.github.com/taichi-jp/985a4b867f6be1348b9a31bb483630d1

GET

{
    "name": "xxx",
    "cluster_name": "xxx",
    "cluster_uuid": "xxx",
    "version": {
        "number": "7.7.0",
        "build_flavor": "oss",
        "build_type": "tar",
        "build_hash": "unknown",
        "build_date": "2020-08-18T20:35:37.721611Z",

This file has been truncated. show original

GET _cat health?v

epoch      timestamp cluster                             status node.total node.data discovered_master shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1602138840 06:34:00  xxxxxxxxxxxxxxxxxxx green           5         2              true     14   7    0    0        0             0                  -                100.0%

GET _cat indices?v

health status index                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxx   2   1    2300000            0      4.4gb          2.2gb
green  open   .kibana_1             xxxxxxxxxxxxxxxxxxxxxx   1   1          1            0      7.7kb          3.8kb

There are more than three files. show original

Christian_Dahlqvist · October 8, 2020, 6:52am

You may also look at making indexing more efficient, e.g. by increasing the refresh interval of the index if you have not already done so. You might even disable it during the bulk load and just enable it afterwards.

system · November 5, 2020, 6:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow first request on an index after a short amount of time Elasticsearch	6	9901	March 13, 2020
Query Performance Elasticsearch	11	1824	July 6, 2017
Bulk Index suddenly taking almost exactly 60 seconds, regardless of number of documents Elasticsearch	2	457	February 2, 2017
Slow aggregation queries, only after data change (ES 2.3) Elasticsearch	9	1321	December 26, 2016
Query performance issue - need help to investigate Elasticsearch	9	2192	July 5, 2017

Queries get slow while indexing documents

Related topics