Elasticsearch inaccurate shard state when timed_out is true

niketpatel2525 · September 29, 2021, 7:49pm

Hey there,

I have an index with 40 shards and I use the timeout feature with my search query. As per the documentation, if request timeout before it completes search on all shards, it will return whatever the results other shards return. But _shard state always returns me 40 shards successful and 0 failed and 0 skipped.

Can anyone suggest how should I make a request so, I will either get correct stats or will have timed_out flag false and get correct/accurate results with timeout?

warkolm · September 29, 2021, 11:37pm

If you're getting results from all shards then doesn't that imply it's not timing out?

niketpatel2525 · September 30, 2021, 3:12pm

@warkolm Thanks for the reply. In my case, I know the actual count and when the timed_out flag is true, the total hits count is not accurate, and also at the same time, the shards' stats tells me all shards are successful.

{
  "took" : 11,
  "timed_out" : true,
  "_shards" : {
    "total" : 40,
    "successful" : 40,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2229157, // inaccurate value
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

warkolm · September 30, 2021, 9:44pm

How do you know it's inaccurate?

niketpatel2525 · October 28, 2021, 2:35am

@warkolm If I keep refreshing it, I will get the flag as false with one of the requests and the count that I get, that is the highest match I ever get. Also, I have indexed data in SQL, I can run the query there and it also matches the number which I get when the flag is false.

warkolm · October 28, 2021, 2:40am

What do your Elasticsearch logs show?
What is the output from the _cluster/stats?pretty&human API?
What does hot threads show at the time?

niketpatel2525 · October 28, 2021, 10:52pm

I am getting this error while running GET _cluster/stats

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Values less than -1 bytes are not supported: -9223371980899287259b"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Values less than -1 bytes are not supported: -9223371980899287259b",
    "suppressed" : [
      {
        "type" : "illegal_state_exception",
        "reason" : "Failed to close the XContentBuilder",
        "caused_by" : {
          "type" : "i_o_exception",
          "reason" : "Unclosed object or array found"
        }
      }
    ]
  },
  "status" : 400
}

Omnisearch logs

[2021-10-28T21:51:20,762][ERROR][c.t.e.t.MetricsTransportInterceptor] [forecaster-prod-data-0] indices:data/read/search[phase/query] failure.
org.elasticsearch.transport.RemoteTransportException: [forecaster-prod-data-99][10.41.157.108:31436][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@3ef8af7b on QueueResizingEsThreadPoolExecutor[name = forecaster-prod-data-99/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 8.5ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@6e151c3c[Running, pool size = 13, active threads = 13, queued tasks = 1000, completed tasks = 4360042]]

niketpatel2525 · November 1, 2021, 2:54pm

@warkolm Any suggestion from the logs? Do I need to tune in to the cluster to handle the request? 90% of traffic has a Filter size (term filter) between 1 to 25. And I also have traffic with total filters of more than 500 in many cases (~2-3%).

warkolm · November 2, 2021, 3:15am

It looks like the node is overloaded.

What's the output from _cat/threadpools?v?

Mark_Harwood · November 2, 2021, 9:29am

If I recall correctly a timeout on a shard is not reported as a failure or a skip.

An example of a skip is when your query is for a time range and you have time-based indices whose content lies outside the time-span of the query. We can skip querying those.
An example of a failure might be a node on earlier software version fails to parse new query syntax.

You might be interested in the allow_partial_results query flag:

(Optional, Boolean) If true , returns partial results if there are shard request timeouts or shard failures. If false , returns an error with no partial results. Defaults to true .
To override the default for this field, set the search.default_allow_partial_results cluster setting to false .

niketpatel2525 · November 2, 2021, 1:34pm

@warkolm Here's the response for _cat/thread_pool?v

I couldn't paste all logs. Here are few lines where rejected was non-zero. rest of the field mostly 0 for the last three columns.

node_name                name                      active queue rejected
elasticsearch-prod-data-102 search                         0     0     6256
elasticsearch-prod-data-90  search                         0     0     1101
elasticsearch-prod-data-66  search                         0     0     3210
elasticsearch-prod-data-105 search                         0     0    14380
elasticsearch-prod-data-81  management                     1     0        0
elasticsearch-prod-data-81  search                         0     0     1378
elasticsearch-prod-data-48  management                     1     0        0
elasticsearch-prod-data-48  search                         0     0      274
elasticsearch-prod-data-30  management                     1     0        0
elasticsearch-prod-data-95  management                     1     0        0
elasticsearch-prod-data-95  search                         0     0     5258
elasticsearch-prod-data-115 management                     1     0        0
elasticsearch-prod-data-115 search                         0     0       36
elasticsearch-prod-data-71  management                     1     0        0
elasticsearch-prod-data-58  management                     1     0        0
elasticsearch-prod-data-58  search                         0     0     1016
elasticsearch-prod-data-107 management                     1     0        0
elasticsearch-prod-data-107 search                         0     0      467
elasticsearch-prod-data-77  management                     1     0        0
elasticsearch-prod-data-77  search                         0     0      330
elasticsearch-prod-data-0   management                     1     0        0
elasticsearch-prod-data-18  management                     1     0        0
elasticsearch-prod-data-18  search                         0     0     1951
elasticsearch-prod-data-74  management                     1     0        0
elasticsearch-prod-data-74  search                         0     0      524
elasticsearch-prod-data-46  management                     1     0        0
elasticsearch-prod-data-46  search                         0     0      465
elasticsearch-prod-data-22  management                     1     0        0
elasticsearch-prod-data-22  search                         0     0      398
elasticsearch-prod-data-16  management                     1     0        0
elasticsearch-prod-data-16  search                         0     0      457
elasticsearch-prod-data-26  management                     1     0        0
elasticsearch-prod-data-26  search                         0     0       31
elasticsearch-prod-data-28  management                     1     0        0
elasticsearch-prod-data-28  search                         0     0     1592
elasticsearch-prod-data-10  management                     1     0        0
elasticsearch-prod-data-10  search                         0     0     6397
elasticsearch-prod-data-34  management                     1     0        0
elasticsearch-prod-data-34  search                         0     0     2700
elasticsearch-prod-data-8   management                     1     0        0
elasticsearch-prod-data-8   search                         0     0    15827
elasticsearch-prod-data-42  management                     1     0        0
elasticsearch-prod-data-42  search                         0     0     5188
elasticsearch-prod-data-98  management                     1     0        0
elasticsearch-prod-data-54  management                     1     0        0
elasticsearch-prod-data-54  search                         0     0     6066
elasticsearch-prod-data-59  management                     1     0        0
elasticsearch-prod-data-94  management                     1     0        0
elasticsearch-prod-data-94  search                         0     0     1252
elasticsearch-prod-data-25  management                     1     0        0
elasticsearch-prod-data-25  search                         0     0      590
elasticsearch-prod-data-15  management                     1     0        0
elasticsearch-prod-data-15  search                         0     0      120
elasticsearch-prod-data-9   management                     1     0        0
elasticsearch-prod-data-9   search                         0     0     7357
elasticsearch-prod-data-44  management                     1     0        0
elasticsearch-prod-data-44  search                         0     0     4145
elasticsearch-prod-data-89  management                     1     0        0
elasticsearch-prod-data-89  search                         0     0     2388
elasticsearch-prod-data-85  management                     1     0        0
elasticsearch-prod-data-85  search                         0     0     1406
elasticsearch-prod-data-93  management                     1     0        0
elasticsearch-prod-leader-2 management                     1     0        0
elasticsearch-prod-data-1   management                     1     0        0
elasticsearch-prod-data-1   search                         0     0     2202
elasticsearch-prod-data-112 management                     1     0        0
elasticsearch-prod-data-112 search                         0     0     6540
elasticsearch-prod-data-6   management                     1     0        0
elasticsearch-prod-data-6   search                         0     0     5484
elasticsearch-prod-data-109 management                     1     0        0
elasticsearch-prod-data-109 search                         0     0     1037
elasticsearch-prod-data-56  management                     1     0        0
elasticsearch-prod-data-56  search                         0     0     1082
elasticsearch-prod-data-23  management                     1     0        0
elasticsearch-prod-data-23  search                         0     0     1115
elasticsearch-prod-data-21  management                     1     0        0
elasticsearch-prod-data-41  management                     1     0        0
elasticsearch-prod-data-41  search                         0     0     1990
elasticsearch-prod-data-73  management                     1     0        0
elasticsearch-prod-data-73  search                         0     0      105
elasticsearch-prod-data-55  management                     1     0        0
elasticsearch-prod-data-111 management                     1     0        0
elasticsearch-prod-data-111 search                         0     0     1604
elasticsearch-prod-data-96  management                     1     0        0
elasticsearch-prod-data-96  search                         0     0     1120
elasticsearch-prod-data-116 management                     1     0        0
elasticsearch-prod-data-116 search                         0     0    17439
elasticsearch-prod-data-38  management                     1     0        0
elasticsearch-prod-data-38  search                         0     0      914
elasticsearch-prod-data-65  management                     1     0        0
elasticsearch-prod-data-65  search                         0     0      668
elasticsearch-prod-data-88  management                     1     0        0
elasticsearch-prod-data-88  search                         0     0     1879
elasticsearch-prod-data-17  management                     1     0        0
elasticsearch-prod-data-17  search                         0     0     2382
elasticsearch-prod-data-13  management                     1     0        0
elasticsearch-prod-data-13  search                         0     0     1248
elasticsearch-prod-data-68  management                     1     0        0
elasticsearch-prod-data-68  search                         0     0      529
elasticsearch-prod-data-60  management                     1     0        0
elasticsearch-prod-data-60  search                         0     0    11556
elasticsearch-prod-data-24  management                     1     0        0
elasticsearch-prod-data-24  search                         0     0     2344
elasticsearch-prod-data-2   management                     1     0        0
elasticsearch-prod-data-2   search                         0     0      635
elasticsearch-prod-data-63  management                     1     0        0
elasticsearch-prod-data-63  search                         0     0     4692

niketpatel2525 · November 2, 2021, 1:38pm

@Mark_Harwood I have tried using allow_partial_results query param with my requests. But I ended up getting higher latency for my requests. Which is not acceptable in my case. I want to optimize the cluster if possible with allow_partial_results.

Mark_Harwood · November 2, 2021, 4:10pm

I'm not sure if you're setting it to true or false.
If anything I'd expect allow_partial_results:false to return faster because in the event of a timeout or failure it curtails search activity that would otherwise continue

niketpatel2525 · November 2, 2021, 4:48pm

@Mark_Harwood I set allow_partial_results to false. Because I want to get an accurate search result count.

Mark_Harwood · November 2, 2021, 5:28pm

Because I want to get an accurate search result count.

allow_partial_results:false is for returning only errors rather than partial results with warnings.

allow_partial_results:true is the behaviour typically wanted by ecommerce product searches that want to "keep on trucking" and show some results in the events of timeouts and partial failures.

allow_partial_results:false is the behaviour typically wanted by analytics e.g. summing financial figures where totals are important. A timeout or partial failure gathering data has an unbounded error margin (we don't know what we might be missing) which is unacceptable to many so better not to show anything rather than the wildly inaccurate

system · November 30, 2021, 5:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
failed to retrieve shard stats from node [zxt4RAOiRZy9Lol9IdIGfg]: [node_2][10.202.152.18:9300][indices:monitor/stats[n]] request_id [77247 683] timed out after [15016ms] Elasticsearch	0	105	April 9, 2024
Searches timing out on certain indices Elasticsearch	2	350	August 20, 2019
0 hits, 14693688 total on index with 700 documents Elasticsearch	15	2239	November 7, 2017
Failed shards in search response, but no reasons Elasticsearch	1	338	July 6, 2017
Search request's timeout doesn't work as expected Elasticsearch	18	3275	April 23, 2023

Elasticsearch inaccurate shard state when timed_out is true

Related topics