1 of 105 shards failed with kibana 7.2.0

Hello, I have a elastic cloud deployment with the elasticsearch and the kibana associated with.

I have also a on-premise kibana instance ( idem: 7.2.0, see conf below )

I create some index patterns that refers to several indexes (same regex on the 2 kibanas).

  • On the cloud kibana, no problem, the queries are OK over this index pattern
  • On the on-premise kibana, I have the shard fail issue (it seems that it is on the biggest index, but cannot confirm)

For your information, the query with the dev tools section works fine. Do you have any lead for this behavior?

best

server.basePath: "/bigdata"

kibana.index: .kibana-revima-ahm

elasticsearch.hosts: ["https://XXXX.eu-west-1.aws.found.io:9243"]
elasticsearch.username: "XXXX"
elasticsearch.password: "XXXX"

logging.dest: /home/XXX/logs/kibana.log

xpack.reporting.encryptionKey: "XXXX"
xpack.reporting.kibanaServer.port: 443
xpack.reporting.kibanaServer.protocol: https
xpack.reporting.kibanaServer.hostname: XXX.flightwatching.com
xpack.reporting.index: ".reporting-XXX"

Hi there!

I don't see anything obvious in your config that would be wrong. Some additional details would help us get to the bottom of this:

  • What action are you trying to do that causes the error? If it's a search, can you provide a copy of the search?
  • Are there any more error details in the failed request in the Network tab of your browser?
  • Are the error logs on the server you could share?

Yes it is from the discover part.

1/ Here is a copy of the request (as per inspect button -> below)
2/ Nothing interesting in the chrome network tab
3/ Logs herebelow (from elastic cloud)

REQUEST

{
  "version": true,
  "size": 500,
  "sort": [
    {
      "@timestamp": {
        "order": "desc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "_source": {
    "excludes": []
  },
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "30s",
        "time_zone": "Europe/Paris",
        "min_doc_count": 1
      }
    }
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {
    "aps3200_MES1_EGT_DELTA": {
      "script": {
        "source": "if (doc.containsKey('MES1_EGTA'))  {\n  return (333 - doc['MES1_EGTA'].value); \n}\nreturn null;",
        "lang": "painless"
      }
    },
    "aps3200_MES2_EGT_DELTA": {
      "script": {
        "source": "if (doc.containsKey('MES2_EGTA'))  {\n  return (333 - doc['MES2_EGTA'].value); \n}\nreturn null;",
        "lang": "painless"
      }
    },
    "a380 MES2 EGT MARGIN": {
      "script": {
        "source": "if (doc.containsKey('MES2_APUEGT')) {\n\n  double[] C90 = new double[] {333, -1, 0.04};\n  double[] C180 = new double[] {333, -1.5, 0.05};\n  double[] C240 = new double[] {333, -1.5, 0.05};\n\n  double IT=doc['MES2_APUIT'].value;\n  double EGT=doc['MES2_APUEGT'].value;\n  double KV = doc['MES2_APUGEN1'].value+doc['MES2_APUGEN2'].value;\n\n  double CORR_90=C90[1]*IT+C90[0];\n  double CORR_180=C180[1]*IT+C180[0];\n  double CORR_240=IT*IT*C240[2]+C240[1]*IT+C240[0];\n\n  if (EGT>0) {\n    if (KV<333) {\n      return EGT-CORR_90;\n    } else if (KV<180) {\n      return EGT-(CORR_90+(KV-333)*(CORR_180-CORR_90)/(333));\n    } else if (KV>333) {\n      return EGT-(CORR_180+(KV-333)*(CORR_240-CORR_180)/(3333-333));\n    }\n  } \n} \nreturn null;",
        "lang": "painless"
      }
    },
    "a380 MES2 TOTAL GEN": {
      "script": {
        "source": "if (doc.containsKey('MES2_APUGEN1') && doc.containsKey('MES2_APUGEN2')) {\n    return doc['MES2_APUGEN1'].value+doc['MES2_APUGEN2'].value;\n} \nreturn null;",
        "lang": "painless"
      }
    },
    "a380 MES2 DIFF GEN": {
      "script": {
        "source": "if (doc.containsKey('MES2_APUGEN1') && doc.containsKey('MES2_APUGEN2') ) { \n\nreturn Math.abs(doc['MES2_APUGEN2'].value - doc['MES2_APUGEN1'].value); }\n\nreturn null;",
        "lang": "painless"
      }
    },
    "a380 MES1 DIFF GEN": {
      "script": {
        "source": "if (doc.containsKey('MES1_APUGEN1') && doc.containsKey('MES1_APUGEN2') ) { return Math.abs(doc['MES1_APUGEN2'].value - doc['MES1_APUGEN1'].value); }\nreturn null;",
        "lang": "painless"
      }
    },
    "a380 MES3 DIFF GEN": {
      "script": {
        "source": "if (doc.containsKey('MES3_APUGEN1') && doc.containsKey('MES3_APUGEN2') ) { return Math.abs(doc['MES3_APUGEN2'].value - doc['MES3_APUGEN1'].value); }",
        "lang": "painless"
      }
    },
    "a380 MES4 DIFF GEN": {
      "script": {
        "source": "if (doc.containsKey('MES4_APUGEN1') && doc.containsKey('MES4_APUGEN2') ) { return Math.abs(doc['MES4_APUGEN2'].value - doc['MES4_APUGEN1'].value); }",
        "lang": "painless"
      }
    },
    "RUNDN_TIME_NORM": {
      "script": {
        "source": "if (doc.containsKey('RUNDN_TIME_RUNDN')) { \n\n return doc['RUNDN_TIME_RUNDN'].value/((doc['TAT'].value+ 333)/333) ; } \n\nreturn null;",
        "lang": "painless"
      }
    },
///SEVERAL SCRIPTED FIELDS
    },
    "APS5000_egt21Amb": {
      "script": {
        "source": "if (doc.containsKey('MES_EGT_SEL')) { \n\ndouble egt21= ((doc['MES_EGT_SEL'].value+333)/((doc['MES_TAMB'].value+ 333)/(333+333))-333);\n \nif ((doc['MES_EGT_SEL'].value)>0) {\nreturn egt21-333*(doc['MES_TAMB'].value)+333; } \n}\nreturn null;",
        "lang": "painless"
      }
    }
  },
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "APUC_CHANGED",
      "format": "date_time"
    },
    {
      "field": "FMU_CHANGED",
      "format": "date_time"
    },
    {
      "field": "acars.reception_date",
      "format": "date_time"
    },
    {
      "field": "acars.transmission_date",
      "format": "date_time"
    },
    {
      "field": "wilco_date",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "H_REP.keyword": {
              "query": "202"
            }
          }
        },
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2019-08-06T08:32:41.499Z",
              "lte": "2019-08-06T08:47:41.499Z"
            }
          }
        }
      ],
      "filter": [
        {
          "match_all": {}
        }
      ],
      "should": [],
      "must_not": []
    }
  },
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {}
    },
    "fragment_size": 2147483647
  }
}

LOGS

WARN
[instance-0000000061] [PUT /_xpack/license] is deprecated! Use [PUT /_license] instead.
6 août 2019 à 8:30:34 UTC

WARN (several like this)
[instance-0000000061] [interval] on [date_histogram] is deprecated, use [fixed_interval] or [calendar_interval] in the future.

This query doesn't look particularly problematic.

  • You mentioned that this works on the Cloud instance. Is there any difference you see in the request compared to the on-prem Kibana instance?
  • Does the inspect flyout show you any additional information about the response? You should be able to get more information about why the shard failed.
  • What exactly is the error message you're seeing? Is it all shards failing or just some?

I have some script exceptions like below. Could it be the problem?

"failures": [
  {
    "shard": 0,
    "index": "wilco__revima-qtr__b787__606__v2",
    "node": "b7hqHKI-Q-y9q3ngj_5EBw",
    "reason": {
      "type": "script_exception",
      "reason": "runtime error",
      "script_stack": [
        "org.elasticsearch.index.fielddata.ScriptDocValues$Doubles.get(ScriptDocValues.java:249)",
        "org.elasticsearch.index.fielddata.ScriptDocValues$Doubles.getValue(ScriptDocValues.java:243)",
        "return doc['RUNDN_TIME_RUNDN'].value/((doc['TAT'].value+ 273.15)/294.15) ; } \n\n",
        "                              ^---- HERE"
      ],
      "script": "if (doc.containsKey('RUNDN_TIME_RUNDN')) { \n\n return doc['RUNDN_TIME_RUNDN'].value/((doc['TAT'].value+ 273.15)/294.15) ; } \n\nreturn null;",
      "lang": "painless",
      "caused_by": {
        "type": "illegal_state_exception",
        "reason": "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
      }
    }
  },

That looks like the problem! The scripted fields for the on-prem Kibana must be different than the Cloud Kibana. You'll either need to fix or remove this scripted field in the on-prem Kibana to get this working.

Yes, understand. But I do not know why it is a shard issue... the message is confusing...

"Shard failure" is a pretty generic error message that indicates that an Elasticsearch shard failed to run the query. It is admittedly a confusing message as it can indicate a number of different problems.