Elasticsearch 7.2 slow query after update

I noticed that after posting update to even single document when I search on that index for this single document (query is not trivial but not especially complex either but it has nested element)
there is a perceptible delay in getting result. I looked at took values and noticed that after update the query takes 700+ ms where normally it would take just 40 ms. It feels almost like when I hit search first time after elastic started with this perceptible delay in getting results

I index/query using low level rest client and use bulk for indexing (even when just one document - it could be one or many depending)

I would greatly appreciate if you could point me to something i can troubleshoot

Thank you,
alex

What happens if you trigger manually a refresh of the index before running the search?
What kind of hardware do you have? SSD or spinning disks?
What is the query looking like?

Hi @dadoonet,

Thank you for looking into it

After refresh it seems to give me consistent times of 5ms (but also after elastic restart slow times went from 700-1000ms to 300-500 and became less consistently reproducible)
It is on my development workstation with NVMe SSD. Single node with out of the box config. Not much data (couple of indices under total 1G but documents with hundreds of fields and two levels of nesting. In that index is about 40K of main docs and 1000K nested). I would think default elastic memory allocation is sufficient for this size of the data

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "multi_match": {
                  "query": "bean",
                  "type": "phrase",
                  "slop": 5,
                  "fields": [
                    "text_search.protectiveOrderIssues.prefix",
                    "text_search.protectiveOrderIssues^2"
                  ]
                }
              }
            ],
            "_name": "criteria_root",
            "filter": [
              {
                "term": {
                  "secondaryDocket": false
                }
              }
            ]
          }
        },
        {
          "nested": {
            "path": "secondaryDockets",
            "query": {
              "bool": {
                "must": [
                  {
                    "multi_match": {
                      "query": "bean",
                      "type": "phrase",
                      "slop": 5,
                      "fields": [
                        "secondaryDockets.text_search.protectiveOrderIssues.prefix",
                        "secondaryDockets.text_search.protectiveOrderIssues^2"
                      ]
                    }
                  }
                ]
              }
            },
            "_name": "criteria_secondary_dockets",
            "inner_hits": {
              "_source": false,
              "highlight": {
                "number_of_fragments": 0,
                "fragment_size": 300,
                "encoder": "html",
                "require_field_match": true,
                "fields": {
                  "secondaryDockets.text_search.protectiveOrderIssues.prefix": {}
                }
              }
            }
          }
        }
      ]
    }
  },
  "highlight": {
    "number_of_fragments": 0,
    "fragment_size": 300,
    "encoder": "html",
    "require_field_match": true,
    "fields": {
      "text_search.protectiveOrderIssues.prefix": {}
    }
  },
  "_source": {
    "includes": [
      "id",
      "epdsAnumber",
      "bNumber",
      "caseNumber",
      "protester",
      "primaryAgency",
      "status",
      "solicitationNumbers",
      "secondaryDocket",
      "type.abbreviation",
      "filedDate",
      "dueDate",
      "agencyReportDue",
      "agencyReportCommentsDue",
      "daysToDueDate",
      "team",
      "assistantGc",
      "attorney",
      "identifiers",
      "secondaryDockets.id",
      "secondaryDockets.epdsAnumber",
      "secondaryDockets.bNumber",
      "secondaryDockets.caseNumber",
      "secondaryDockets.protester",
      "secondaryDockets.primaryAgency",
      "secondaryDockets.status",
      "secondaryDockets.solicitationNumbers",
      "secondaryDockets.secondaryDocket",
      "secondaryDockets.type.abbreviation",
      "secondaryDockets.filedDate",
      "secondaryDockets.dueDate",
      "secondaryDockets.agencyReportDue",
      "secondaryDockets.agencyReportCommentsDue",
      "secondaryDockets.daysToDueDate",
      "secondaryDockets.team",
      "secondaryDockets.assistantGc",
      "secondaryDockets.attorney",
      "secondaryDockets.identifiers"
    ]
  },
  "from": 0,
  "size": 20,
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    },
    {
      "caseNumber.fileNumber": {
        "order": "asc",
        "missing": "_first"
      }
    },
    {
      "caseNumber.caseNumber": {
        "order": "asc",
        "missing": "_first"
      }
    },
    {
      "filedDate": {
        "order": "asc",
        "missing": "_first"
      }
    }
  ]
}

any thoughts anyone?

Some thoughts:

In 7.x series we changed to way we are running refresh. If there is no search activity, then the refresh is not called IIRC.
If you have a search request coming then the refresh is triggered before the search is actually ran.

So I was wondering if you were hitting this behavior as you are the only one to run a search.

This is exactly how it feels. I thought about it too but did not know of the change in behavior.
I think for my use case this is not the best way to run refresh. I have an application where updates to domain objects (backed by ORM and relational database) streamed to elastic behind the scene in almost real time (when load is light it will be effectively almost one by one when heavy it could be in small batches)

Write load on elastic is light and steady and typical search load is light and consists of simple searches which suppose to return instantly. When people do more sophisticated searches and especially various analytics and big reports search load may pick up.

My concern is the steady write stream will constantly leave elastic in not refreshed state and all this small searches will often exhibit unpleasant 1000 ms delay which make system appear sluggish

I do not want to request updates to block till refresh is done (is it even supported for batch enpoint?) because I do not need it. Is the refresh strategy configurable? If so will it always be an option - I do not want to use some temporary backward compatibility option to be removed in next major release

Thank you,

alex

7.2 Docs say

index.refresh_interval

How often to perform a refresh operation, which makes recent changes to the index visible to search. Defaults to 1s . Can be set to -1 to disable refresh. If this setting is not explicitly set, shards that haven’t seen search traffic for at least index.search.idle.after seconds will not receive background refreshes until they receive a search request. Searches that hit an idle shard where a refresh is pending will wait for the next background refresh (within 1s ). This behavior aims to automatically optimize bulk indexing in the default case when no searches are performed. In order to opt out of this behavior an explicit value of 1s should set as the refresh interval.

As I understand it correctly it should not used delayed strategy by default
Also when I reindex my whole DB I create new index and I set refresh to -1 and then to null to return to default. Elastic log reports:

... updating [index.refresh_interval] from [1s] to [-1]

... updating [index.refresh_interval] from [-1] to [1s]

So I assume my default is 1sand refresh delay should not apply but it looks like it does

You can run an index query with ?refresh=wait_for which will wait for the next refresh to occur before sending you the response. See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html

Is the refresh strategy configurable?

I "think" that index.search.idle.after is the new setting. See https://www.elastic.co/guide/en/elasticsearch/reference/7.2/index-modules.html

How long a shard can not receive a search or get request until it’s considered search idle. (default is 30s).

So if you don't have any search within 30s, the next search will take more time probably (refresh+search).
You can probably define it to something bigger like 5m?

Docs seem to be inconsistent with what happens

The setting 'index.refresh_interval' is set explicitly so deferring refresh should not be in play at all per the document

Either the doc is unclear (wrong?) or the implementation or debug logs show 1s even when it is actually NOT set for an index based on some global default in which case it does act appropriately and I need to test setting 1s explicitly

also in light of this
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html part describing " To reset a setting back to the default value, use null" is confusing because "not set" value (null) now is different from a set value it means defer refresh and not merely set to default. What will setting to null do?

  • reset to global default
  • unset to enable deferred refresh

Do you think I should open a ticket?

Hmmm. If I read that documentation issue and its associated PR correctly, it looks like that setting refresh_interval should stop the "idle" behavior.

I don't know if it's an issue and I'd like to have @jpountz commenting on this before you open an issue.

@roytmana Can you set the refresh_interval explicitly to 1s? Both the default behavior (refresh_interval not set) and refresh_interval: 1s guarantee that documents that have been updated up to 1 second ago are visible, but in a different way as noted above.

@jpountz I set refresh_interval to "1s" explicetly (rather than setting it to null to take default after setting it to -1 for full reindex) and it did not resolve the issue. If I search right after update the first query takes 900ms instead of 15-20 any similar query normally takes

Would you be able to capture hot threads while the first query is loading, so that I can try to understand what it is doing?

i tested some more and I think my previous conclusion (after setting 1s explicitly) was premature. Maybe I tried to search too soon after update and it catches the tail of normal refresh and gives me longer duration. It looks like 1s refresh translates to couple of seconds to really get it searched. When I deliberately counted 5 seconds after update before searching all seems to be good. Let me test some more tonight and I will report tomorrow

2 Likes

Seems to be fine when explicitly specified. It would be nice to be able to explicitly specify if deferred refresh is desired - configure refresh interval and deferral independently. I think it is ambiguous now including setting null to refresh interval which supposed to revert to default but which of the attributes deferral or interval?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.