Unknown response error when running using `_disk_usage` api

I'm looking at utilizing the new [_disk_usage](https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-disk-usage.html) api and doing so successfully in our staging environment.

However, when running against indices in our production environment am getting errors similar to:

"type": "illegal_state_exception",
"reason": "unknown response [[index_name/id][[index_name][7]] BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[node][#.#.#.#:9300][indices:admin/analyze_disk_usage[s]]]; nested: NullPointerException[Cannot invoke \"org.apache.lucene.index.PointValues.getMinPackedValue()\" because \"values\" is null];]"

I've been having trouble tracking down the actual root of this error, and any insight would be greatly appreciated. (i.e. what is the "values" this refers to and how can i track down the culprit)

Hi @Ryan_Morrison Welcome to the community.

What version of the stack are you on and can you provide the exact command you are running?

version info:

"version": { - 
    "number": "7.16.2",
    "build_flavor": "default",
    "build_type": "docker",
    "build_hash": "############",
    "build_date": "2021-12-18T19:42:46.604893745Z",
    "build_snapshot": false,
    "lucene_version": "8.10.1",
    "minimum_wire_compatibility_version": "6.8.0",
    "minimum_index_compatibility_version": "6.0.0-beta1"
  },

I'm using the REST API tooling in Cerebro... however that network request looks like:

curl 'https://host_name/rest/request' \
  -H 'authority: host_name' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-US,en;q=0.9' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json;charset=UTF-8' \
  -H 'cookie: seerid=###########; GCP_IAP_UID=############; GCP_IAAP_AUTH_TOKEN_191E9AF5F3439174=###################' \
  -H 'origin: origin' \
  -H 'pragma: no-cache' \
  -H 'referer: referer' \
  -H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="101", "Opera GX";v="87"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36 OPR/87.0.4390.56' \
  --data-raw '{"method":"POST","path":"index_name/_disk_usage?run_expensive_tasks=true","host":"host_name"}' \
  --compressed

What happens if you just try to curl

curl -XPOST --insecure -u "elastic:password" "https://localhost:9200/my-index/_disk_usage?run_expensive_tasks=true&pretty"

Well I did a quick install and I got it to work
Make sure to take out the body and put it all on the URL Line

yeah, we have it running in our staging environment and the request works just fine... mostly unclear where to begin looking at the true root of the problem... the error happens in lucene, but doesn't help me understand where to look for the root (as It's unclear to me what "values" is in that nested stack trace, being un-initiated on PointValues and how to troubleshoot further around this).

same with curl... (port forwarding in this case to actual host)

 ~ curl -u elastic:<redacted> -X POST "localhost:9200/index_name/_disk_usage?run_expensive_tasks=true&pretty"

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_state_exception",
        "reason" : "unknown response [[index_name/id][[index_name][0]] BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[index_name][#.#.#.#:9300][indices:admin/analyze_disk_usage[s]]]; nested: NullPointerException[Cannot invoke \"org.apache.lucene.index.PointValues.getMinPackedValue()\" because \"values\" is null];]"
      }
    ],
    "type" : "illegal_state_exception",
    "reason" : "unknown response [[index_name/id][[index_name][0]] BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[index_name][#.#.#.#:9300][indices:admin/analyze_disk_usage[s]]]; nested: NullPointerException[Cannot invoke \"org.apache.lucene.index.PointValues.getMinPackedValue()\" because \"values\" is null];]"
  },
  "status" : 500
}

So just repeat the request works fine in your staging environment but not in your production environment.

If you run it in the Kibana DevTools you get the same error. Am I understanding that correct?

I guess the captain obvious question is what's the difference between the index in your staging and index in production?

Also, assume that index is green? There's no missing shards or etc correct?

@DavidTurner any thoughts?

A NPE is a bug (maybe in a plugin or maybe in ES itself). Please add the error_trace parameter to obtain a full stack trace, then open an issue about it on Github.

2 Likes

github submission for posterity 7.16 _disk_usage API - NullPointerException · Issue #87761 · elastic/elasticsearch · GitHub

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.