Elasticsearch high latency

Hi all,

We noticed some high request latency for searches on our elasticsearch cluster(7.17) and while checking the metrics, it was seen that there was spike in search_fetch_time for many indices which were configured 1p:1r. I found from some forums that increasing the replication could help with improving the situation as this will help with read throughput by parallel operation.

However from our tests it turned out that this isn't the case. We did some load tests on the cluster with original setup(1p:1r) and then repeating it after increasing the replication to 1p:2r. The results showed that after the change, the latency is even worse.

The search_query_time and search_fetch_time metrics looked almost identical but the overall end user request latency was found to be almost twice as before. Can anyone provide some insights as to why this may be happening?

What is the size and specification of your cluster? What type of storage are you using?

How many indices are you querying?

What is the full output of the cluster stats API?

Thank for your response.
Our test cluster has 4 nodes (3 data and one voting only). The machines are in azure and using Premium SSD LRS for data disk.
Each search may run across/up to 100 indices as far as I understand.

Cluster stats

{
  "_nodes" : {
    "total" : 4,
    "successful" : 4,
    "failed" : 0
  },
  "cluster_name" : "sqa2-elastic",
  "cluster_uuid" : "xwHwemevRNuJrav2CXXXXX",
  "timestamp" : 1680710542562,
  "status" : "green",
  "indices" : {
    "count" : 361,
    "shards" : {
      "total" : 985,
      "primaries" : 361,
      "replication" : 1.7285318559556786,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 3,
          "avg" : 2.7285318559556786
        },
        "primaries" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 1.0,
          "max" : 2.0,
          "avg" : 1.7285318559556786
        }
      }
    },
    "docs" : {
      "count" : 26637074,
      "deleted" : 1981218
    },
    "store" : {
      "size_in_bytes" : 144333005039,
      "total_data_set_size_in_bytes" : 144333005039,
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size_in_bytes" : 35416664,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 173626844,
      "total_count" : 25393921,
      "hit_count" : 2160449,
      "miss_count" : 23233472,
      "cache_size" : 90241,
      "cache_count" : 100215,
      "evictions" : 9974
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 5132,
      "memory_in_bytes" : 26996494,
      "terms_memory_in_bytes" : 17760800,
      "stored_fields_memory_in_bytes" : 4560648,
      "term_vectors_memory_in_bytes" : 276112,
      "norms_memory_in_bytes" : 2041216,
      "points_memory_in_bytes" : 0,
      "doc_values_memory_in_bytes" : 2357718,
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : 1676475309279,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "binary",
          "count" : 22,
          "index_count" : 22,
          "script_count" : 0
        },
        {
          "name" : "boolean",
          "count" : 281,
          "index_count" : 204,
          "script_count" : 0
        },
        {
          "name" : "constant_keyword",
          "count" : 6,
          "index_count" : 2,
          "script_count" : 0
        },
        {
          "name" : "date",
          "count" : 363,
          "index_count" : 294,
          "script_count" : 0
        },
        {
          "name" : "double",
          "count" : 1,
          "index_count" : 1,
          "script_count" : 0
        },
        {
          "name" : "integer",
          "count" : 28,
          "index_count" : 28,
          "script_count" : 0
        },
        {
          "name" : "ip",
          "count" : 2,
          "index_count" : 2,
          "script_count" : 0
        },
        {
          "name" : "keyword",
          "count" : 3889,
          "index_count" : 358,
          "script_count" : 0
        },
        {
          "name" : "long",
          "count" : 208,
          "index_count" : 115,
          "script_count" : 0
        },
        {
          "name" : "object",
          "count" : 69,
          "index_count" : 49,
          "script_count" : 0
        },
        {
          "name" : "text",
          "count" : 3503,
          "index_count" : 262,
          "script_count" : 0
        }
      ],
      "runtime_field_types" : [ ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "edgeNGram",
          "count" : 49,
          "index_count" : 49
        },
        {
          "name" : "edge_ngram",
          "count" : 28,
          "index_count" : 28
        },
        {
          "name" : "nGram",
          "count" : 18,
          "index_count" : 18
        },
        {
          "name" : "ngram",
          "count" : 18,
          "index_count" : 18
        },
        {
          "name" : "pattern_capture",
          "count" : 22,
          "index_count" : 22
        },
        {
          "name" : "pattern_replace",
          "count" : 31,
          "index_count" : 22
        },
        {
          "name" : "stemmer",
          "count" : 29,
          "index_count" : 29
        },
        {
          "name" : "stop",
          "count" : 207,
          "index_count" : 122
        },
        {
          "name" : "word_delimiter",
          "count" : 46,
          "index_count" : 46
        },
        {
          "name" : "word_delimiter_graph",
          "count" : 17,
          "index_count" : 17
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 387,
          "index_count" : 122
        },
        {
          "name" : "pattern",
          "count" : 30,
          "index_count" : 30
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "classic",
          "count" : 13,
          "index_count" : 13
        },
        {
          "name" : "standard",
          "count" : 103,
          "index_count" : 64
        },
        {
          "name" : "whitespace",
          "count" : 271,
          "index_count" : 89
        }
      ],
      "built_in_filters" : [
        {
          "name" : "cjk_bigram",
          "count" : 4,
          "index_count" : 4
        },
        {
          "name" : "cjk_width",
          "count" : 4,
          "index_count" : 4
        },
        {
          "name" : "lowercase",
          "count" : 387,
          "index_count" : 122
        }
      ],
      "built_in_analyzers" : [ ]
    },
    "versions" : [
      {
        "version" : "7.5.2",
        "index_count" : 73,
        "primary_shard_count" : 73,
        "total_primary_bytes" : 6717128910
      },
      {
        "version" : "7.17.0",
        "index_count" : 119,
        "primary_shard_count" : 119,
        "total_primary_bytes" : 51775389370
      },
      {
        "version" : "7.17.8",
        "index_count" : 169,
        "primary_shard_count" : 169,
        "total_primary_bytes" : 7268308246
      }
    ]
  },
  "nodes" : {
    "count" : {
      "total" : 4,
      "coordinating_only" : 0,
      "data" : 3,
      "data_cold" : 3,
      "data_content" : 3,
      "data_frozen" : 3,
      "data_hot" : 3,
      "data_warm" : 3,
      "ingest" : 3,
      "master" : 4,
      "ml" : 3,
      "remote_cluster_client" : 3,
      "transform" : 3,
      "voting_only" : 1
    },
    "versions" : [
      "7.17.8"
    ],
    "os" : {
      "available_processors" : 26,
      "allocated_processors" : 26,
      "names" : [
        {
          "name" : "Linux",
          "count" : 4
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Ubuntu 20.04.5 LTS",
          "count" : 4
        }
      ],
      "architectures" : [
        {
          "arch" : "amd64",
          "count" : 4
        }
      ],
      "mem" : {
        "total_in_bytes" : 109336596480,
        "free_in_bytes" : 4068691968,
        "used_in_bytes" : 105267904512,
        "free_percent" : 4,
        "used_percent" : 96
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 0
      },
      "open_file_descriptors" : {
        "min" : 439,
        "max" : 2205,
        "avg" : 1719
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 4338718234,
      "versions" : [
        {
          "version" : "19.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "19.0.1+10-21",
          "vm_vendor" : "Oracle Corporation",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 4
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 20538547000,
        "heap_max_in_bytes" : 55834574848
      },
      "threads" : 408
    },
    "fs" : {
      "total_in_bytes" : 3200839278592,
      "free_in_bytes" : 3054183788544,
      "available_in_bytes" : 2891337633792
    },
    "plugins" : [
      {
        "name" : "repository-azure",
        "version" : "7.17.8",
        "elasticsearch_version" : "7.17.8",
        "java_version" : "1.8",
        "description" : "The Azure Repository plugin adds support for Azure storage repositories.",
        "classname" : "org.elasticsearch.repositories.azure.AzureRepositoryPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false,
        "licensed" : false,
        "type" : "isolated"
      },
      {
        "name" : "ingest-attachment",
        "version" : "7.17.8",
        "elasticsearch_version" : "7.17.8",
        "java_version" : "1.8",
        "description" : "Ingest processor that uses Apache Tika to extract contents",
        "classname" : "org.elasticsearch.ingest.attachment.IngestAttachmentPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false,
        "licensed" : false,
        "type" : "isolated"
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 4
      },
      "http_types" : {
        "security4" : 4
      }
    },
    "discovery_types" : {
      "zen" : 4
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "deb",
        "count" : 4
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 2,
      "processor_stats" : {
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        }
      }
    }
  }
}

It looks like you have around 144GB of data in the cluster. Why so many indices and shards? If I calculate correctly your average shard size is around 200MB , which is very small.

We are using ES as search facility for different properties inside website, for example documents, knowledge bases, user properties etc. This application requires several indices with logically separate data and different versions of these (indexed at different time points) indices as per the app team requirement.

@jaykb77, we have a similar experience to yours as well.
We tried to increase replica count hoping to see improvements in query.
But it ended up being worse.
Not only was query time not improving (at least not noticeable to me), but it also add burden to write (due to the extra replica).
So we ended up setting the replica count down to 1 and improve the overall performance other ways.

1 Like

Adding more replicas can help serving a larger number of concurrent queries assuming you add hardware. It can however have a negative impact on latencies as there will be more overall data in the cluster and a smaller portion of the data will fit in the operating system page cache.

As there is more data than memory I would recommend examining what disk I/O latencies look like, e.g. using iostat -x. Having a lot of small shards can result in a lot of small, random concurrent reads which can become the bottleneck.

1 Like

Our machine has 128GB of RAM and 31GB to ES. So there are plenty of file system caching.
Our original hope was that the extra filesystem caching should help query significantly even with the extra data from the replica. I honestly couldn't tell the difference in read performance; therefore, we ended up reverting back to just 1 replica.
The CPU utilization were not high in our cluster at all. So the system is not under stress (at least in our standard) when I performed the test.

Can you share the mount parameters of this data disk that you have in fstab? I saw a couple of issues in the past related to wrong mounting parameters, basically you need to make sure that the disk is mounted with noatime.

Also, follow @Christian_Dahlqvist suggestion and check the I/O performance for you disks.

Even if you have a Premium SSD you need to test if you are indeed getting the IOPS that you should be getting.

Below are the mount parameters for the drive we have

ext4 defaults 0 0

Are there any cons with disabling atime?

There are some specific old tools that rely on the access time of the file (which is the atime), but on the majority of cases there is no impact.

You can read more about it on this blog post.

Mounting with noatime can give you a small performance booster, but it is not guarantee that this is your issue, you need to test the IOPS of your disk to see if it is correct.

Just the fact that Azure says that you have a Premium SSD disk, doesn't mean that you are getting the performance you should get because Azure may have some issues.

I had an issue with Azure a couple of years ago where I spent 3 weeks trying to troubleshoot the performance of an Elasticsearch cluster and in the end it was an underlying hardware on Azure, I had Premium SSDs but the performance was the same as a HDD.

Here im running iostat for the data drive, but a random iostat run doesnt show much I think? Do we have to monitor this continuously to be able to tell if we have problems with IOPS?

iostat -x 1 5 /dev/sdb1
Linux 5.15.0-1007-azure (xxx-elastic-masters-xxx-2-vm-0) 	04/27/23 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    0.07    0.04    0.00   99.38

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdb1             0.02      1.55     0.00   4.68    2.59    98.38    9.94    119.57     5.23  34.47    0.85    12.03    0.00    205.42     0.00  49.92    2.09 76420.31    0.00    0.00    0.01   1.29


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdb1             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.25    0.00    0.00    0.00    0.00   99.75

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdb1             0.00      0.00     0.00   0.00    0.00     0.00    5.00     56.00     9.00  64.29    1.20    11.20    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.01   0.80


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdb1             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.12    0.00    0.00    0.00    0.00   99.88

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdb1             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


The I/O stats look fine.

What do you qualify as a high latency?

Are you running expensive queries?

Given that you are querying across a lot of very small shards I would not be surprised if this contributed to your high latencies as it does not sound optimal at all.

Normally we see response time for end user requests through the frontend proxy as around 250-500ms. But during even small spikes (say 80-100 queries per minute mainly spread across 4-5 indices) in the incoming searches, we see it increases well beyond 3-5s which is significant delay in response.

During even bigger spikes in the incoming traffic, this response time seems to be even bigger, however we couldn't yet point to a bottleneck that actually is causing such response. These are only relatively high number of requests per unit time and the cluster should be able to handle these without a problem as we dont seem to be short of any resources.

The VM system metrics also doesnt show any noticeable change during such events, but elasticsearch metrics search_fetch_time showed considerable spike during few of these spikes which made us think its more of an elasticsearch problem. This is the point where we tried to increase the replicas in an attempt to see if this helps with parallel read better and improve the overall response.

Also here is another snippet from iostat run during a load test where we found the most iowait.

Fri Apr 28 05:25:53 UTC 2023
Linux 5.15.0-1007-azure (xxx-elastic-masters-xxx-1-vm-0)  04/28/23        _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.69    0.00    0.09    0.04    0.00   99.18

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-
sz  %util
sdb1             0.03      2.81     0.00   3.18    2.32   109.47   10.09    122.95     5.34  34.63    0.77    12.19    0.00    200.63     0.00  49.92    1.81 75004.70    0.00    0.00    0.
01   1.24


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00   11.51    0.00   88.49

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdb1             0.00      0.00     0.00   0.00    0.00     0.00    2.00     36.00     7.00  77.78    0.50    18.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.40


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.13    0.00    0.13    1.25    0.00   98.50

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdb1             0.00      0.00     0.00   0.00    0.00     0.00    3.00     24.00     3.00  50.00  344.33     8.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.03 103.20


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.46    0.00    0.62    0.00    0.00   91.92

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdb1             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

What type of queries are you running? Are you using wildcard clauses? Are you using nested documents? Are you using parent-child? Do your queries have a very large amount of clauses?

How many indices and shards does each of these queries target? What is the average and maximum?

If you are querying a large number of shards it is possible possible that you are hitting concurrency limits given that you only have 3 data nodes.

Its more on the application side and we do not directly manage what queries are being run. I am enabling slow logs now for the busy indices to get some insight into the queries and also get in touch with the developers.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.