Kibana is unable to Query Elasticsearch after restoring ELK stack

Hello everyone, I have came across an issue where Kibana appears to be unable to query Elasticsearch at all no matter the query. It is at the point now where it will not even finish executing a query inside of 10 minutes, as in it will keep on saying loading documents. I have also tried against different sized indexes with different queries, return all or return only from this index, and nothing seems to change. For background context this all occurred after we restored our ELK stack server through the use of VEEAM after having to move servers. Before that point everything worked fine without a hitch. Also, after first restarting the server Elasticsearch tried to create many unassigned shards and we went through and deleted some of them to free up shard space if that affects the problem at all.

The main steps I have tried to fix/diagnose this problem have been the following.

  • Confirm Elasticsearch is up and queryable.
    • Confirmed by querying using both the Elasticsearch API and Dev Tools inside Kibana
  • Confirm the timestamp field in the indexes I query are actually date format.
    - Confirmed by going inside of Kibana Index Management and confirming the mapping
  • Check Networking in the browser I am using for any possible error message
    - The query will go for 10 minutes before timing out with no response outside a error on screen of status 0.

There are no logs inside journalctl of Elasticsearch and the logs inside of kibana's journal are not that helpful as well as it only mentions

No matching indices found: logs-osquery_manager.result*

and there is also an error about an API key inside of kibana logs, but the thing is I have never gave kibana an API key. That part is still commented out inside my kibana.yml file.

I apologize if this is not proper format for this board, as this is my first time posting. Also, feel free to let me know what may help you diagnose this problem, and I will try my best to get it to you ASAP.

Welcome to our community! :smiley:

What is the output from the _cluster/stats?pretty&human API?

Hello thank you for the quick response and for the warm welcomes. I also probably should have prefaced this but our elk stack currently runs on one server at the moment. Here our the results though.

{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "cluster_uuid": "8SssFFyiSUikolabt82y6Q",
  "timestamp": 1671586402824,
  "status": "red",
  "indices": {
    "count": 472,
    "shards": {
      "total": 472,
      "primaries": 472,
      "replication": 0,
      "index": {
        "shards": {
          "min": 1,
          "max": 1,
          "avg": 1
        },
        "primaries": {
          "min": 1,
          "max": 1,
          "avg": 1
        },
        "replication": {
          "min": 0,
          "max": 0,
          "avg": 0
        }
      }
    },
    "docs": {
      "count": 2590945328,
      "deleted": 1004468
    },
    "store": {
      "size_in_bytes": 677119987052,
      "total_data_set_size_in_bytes": 677119987052,
      "reserved_in_bytes": 0
    },
    "fielddata": {
      "memory_size_in_bytes": 680,
      "evictions": 0
    },
    "query_cache": {
      "memory_size_in_bytes": 21241,
      "total_count": 339770,
      "hit_count": 19791,
      "miss_count": 319979,
      "cache_size": 14,
      "cache_count": 15,
      "evictions": 1
    },
    "completion": {
      "size_in_bytes": 0
    },
    "segments": {
      "count": 3361,
      "memory_in_bytes": 0,
      "terms_memory_in_bytes": 0,
      "stored_fields_memory_in_bytes": 0,
      "term_vectors_memory_in_bytes": 0,
      "norms_memory_in_bytes": 0,
      "points_memory_in_bytes": 0,
      "doc_values_memory_in_bytes": 0,
      "index_writer_memory_in_bytes": 692534852,
      "version_map_memory_in_bytes": 4470,
      "fixed_bit_set_memory_in_bytes": 23999600,
      "max_unsafe_auto_id_timestamp": 1671567831265,
      "file_sizes": {}
    },
    "mappings": {
      "total_field_count": 207586,
      "total_deduplicated_field_count": 132878,
      "total_deduplicated_mapping_size_in_bytes": 753438,
      "field_types": [
        {
          "name": "alias",
          "count": 2629,
          "index_count": 32,
          "script_count": 0
        },
        {
          "name": "binary",
          "count": 11,
          "index_count": 11,
          "script_count": 0
        },
        {
          "name": "boolean",
          "count": 2479,
          "index_count": 403,
          "script_count": 0
        },
        {
          "name": "byte",
          "count": 16,
          "index_count": 16,
          "script_count": 0
        },
        {
          "name": "constant_keyword",
          "count": 2018,
          "index_count": 431,
          "script_count": 0
        },
        {
          "name": "date",
          "count": 4039,
          "index_count": 453,
          "script_count": 0
        },
        {
          "name": "date_nanos",
          "count": 1,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "date_range",
          "count": 11,
          "index_count": 11,
          "script_count": 0
        },
        {
          "name": "double",
          "count": 3874,
          "index_count": 24,
          "script_count": 0
        },
        {
          "name": "double_range",
          "count": 1,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "flattened",
          "count": 558,
          "index_count": 63,
          "script_count": 0
        },
        {
          "name": "float",
          "count": 3265,
          "index_count": 89,
          "script_count": 0
        },
        {
          "name": "float_range",
          "count": 1,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "geo_point",
          "count": 397,
          "index_count": 150,
          "script_count": 0
        },
        {
          "name": "geo_shape",
          "count": 1,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "half_float",
          "count": 29,
          "index_count": 8,
          "script_count": 0
        },
        {
          "name": "integer",
          "count": 4,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "integer_range",
          "count": 1,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "ip",
          "count": 1850,
          "index_count": 416,
          "script_count": 0
        },
        {
          "name": "ip_range",
          "count": 6,
          "index_count": 6,
          "script_count": 0
        },
        {
          "name": "keyword",
          "count": 82913,
          "index_count": 453,
          "script_count": 0
        },
        {
          "name": "long",
          "count": 42042,
          "index_count": 403,
          "script_count": 0
        },
        {
          "name": "long_range",
          "count": 1,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "match_only_text",
          "count": 2574,
          "index_count": 293,
          "script_count": 0
        },
        {
          "name": "nested",
          "count": 477,
          "index_count": 64,
          "script_count": 0
        },
        {
          "name": "object",
          "count": 54073,
          "index_count": 452,
          "script_count": 0
        },
        {
          "name": "scaled_float",
          "count": 2249,
          "index_count": 126,
          "script_count": 0
        },
        {
          "name": "shape",
          "count": 1,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "short",
          "count": 559,
          "index_count": 18,
          "script_count": 0
        },
        {
          "name": "text",
          "count": 933,
          "index_count": 418,
          "script_count": 0
        },
        {
          "name": "version",
          "count": 13,
          "index_count": 13,
          "script_count": 0
        },
        {
          "name": "wildcard",
          "count": 560,
          "index_count": 99,
          "script_count": 0
        }
      ],
      "runtime_field_types": []
    },
    "analysis": {
      "char_filter_types": [],
      "tokenizer_types": [],
      "filter_types": [],
      "analyzer_types": [
        {
          "name": "pattern",
          "count": 15,
          "index_count": 15
        }
      ],
      "built_in_char_filters": [],
      "built_in_tokenizers": [],
      "built_in_filters": [],
      "built_in_analyzers": [
        {
          "name": "powershell_script_analyzer",
          "count": 15,
          "index_count": 15
        },
        {
          "name": "simple",
          "count": 20,
          "index_count": 10
        }
      ]
    },
    "versions": [
      {
        "version": "7.17.4",
        "index_count": 114,
        "primary_shard_count": 114,
        "total_primary_bytes": 207991069740
      },
      {
        "version": "7.17.5",
        "index_count": 5,
        "primary_shard_count": 5,
        "total_primary_bytes": 13945532195
      },
      {
        "version": "8.3.3",
        "index_count": 72,
        "primary_shard_count": 72,
        "total_primary_bytes": 195066488195
      },
      {
        "version": "8.4.1",
        "index_count": 106,
        "primary_shard_count": 106,
        "total_primary_bytes": 250218341706
      },
      {
        "version": "8.4.3",
        "index_count": 185,
        "primary_shard_count": 185,
        "total_primary_bytes": 9898555216
      }
    ]
  },
  "nodes": {
    "count": {
      "total": 1,
      "coordinating_only": 0,
      "data": 1,
      "data_cold": 1,
      "data_content": 1,
      "data_frozen": 1,
      "data_hot": 1,
      "data_warm": 1,
      "ingest": 1,
      "master": 1,
      "ml": 1,
      "remote_cluster_client": 1,
      "transform": 1,
      "voting_only": 0
    },
    "versions": [
      "8.4.3"
    ],
    "os": {
      "available_processors": 20,
      "allocated_processors": 20,
      "names": [
        {
          "name": "Linux",
          "count": 1
        }
      ],
      "pretty_names": [
        {
          "pretty_name": "Ubuntu 20.04.5 LTS",
          "count": 1
        }
      ],
      "architectures": [
        {
          "arch": "amd64",
          "count": 1
        }
      ],
      "mem": {
        "total_in_bytes": 82362953728,
        "adjusted_total_in_bytes": 82362953728,
        "free_in_bytes": 38228340736,
        "used_in_bytes": 44134612992,
        "free_percent": 46,
        "used_percent": 54
      }
    },
    "process": {
      "cpu": {
        "percent": 1
      },
      "open_file_descriptors": {
        "min": 4277,
        "max": 4277,
        "avg": 4277
      }
    },
    "jvm": {
      "max_uptime_in_millis": 18602429,
      "versions": [
        {
          "version": "18.0.2.1",
          "vm_name": "OpenJDK 64-Bit Server VM",
          "vm_version": "18.0.2.1+1-1",
          "vm_vendor": "Oracle Corporation",
          "bundled_jdk": true,
          "using_bundled_jdk": true,
          "count": 1
        }
      ],
      "mem": {
        "heap_used_in_bytes": 4326123944,
        "heap_max_in_bytes": 21474836480
      },
      "threads": 166
    },
    "fs": {
      "total_in_bytes": 2171038474240,
      "free_in_bytes": 1138080641024,
      "available_in_bytes": 1029024595968
    },
    "plugins": [],
    "network_types": {
      "transport_types": {
        "security4": 1
      },
      "http_types": {
        "security4": 1
      }
    },
    "discovery_types": {
      "single-node": 1
    },
    "packaging_types": [
      {
        "flavor": "default",
        "type": "deb",
        "count": 1
      }
    ],
    "ingest": {
      "number_of_pipelines": 101,
      "processor_stats": {
        "append": {
          "count": 940832,
          "failed": 0,
          "current": 0,
          "time_in_millis": 1060
        },
        "community_id": {
          "count": 1035970,
          "failed": 4170,
          "current": 0,
          "time_in_millis": 11088
        },
        "conditional": {
          "count": 40461689,
          "failed": 1295,
          "current": 0,
          "time_in_millis": 467178
        },
        "convert": {
          "count": 6053043,
          "failed": 8577,
          "current": 0,
          "time_in_millis": 11861
        },
        "date": {
          "count": 9334591,
          "failed": 0,
          "current": 0,
          "time_in_millis": 160411
        },
        "fingerprint": {
          "count": 9731,
          "failed": 0,
          "current": 0,
          "time_in_millis": 168
        },
        "foreach": {
          "count": 8374,
          "failed": 0,
          "current": 0,
          "time_in_millis": 26
        },
        "geoip": {
          "count": 5278548,
          "failed": 0,
          "current": 0,
          "time_in_millis": 3616
        },
        "grok": {
          "count": 1054809,
          "failed": 5199,
          "current": 0,
          "time_in_millis": 23768
        },
        "gsub": {
          "count": 7816045,
          "failed": 0,
          "current": 0,
          "time_in_millis": 18156
        },
        "json": {
          "count": 14693,
          "failed": 0,
          "current": 0,
          "time_in_millis": 311
        },
        "kv": {
          "count": 1031783,
          "failed": 0,
          "current": 0,
          "time_in_millis": 144896
        },
        "lowercase": {
          "count": 3070193,
          "failed": 551974,
          "current": 0,
          "time_in_millis": 12560
        },
        "pipeline": {
          "count": 11653973,
          "failed": 0,
          "current": 0,
          "time_in_millis": 12916267557315
        },
        "registered_domain": {
          "count": 4187,
          "failed": 0,
          "current": 0,
          "time_in_millis": 34
        },
        "remove": {
          "count": 21817711,
          "failed": 0,
          "current": 0,
          "time_in_millis": 35143
        },
        "rename": {
          "count": 23336559,
          "failed": 1692981,
          "current": 0,
          "time_in_millis": 46215
        },
        "script": {
          "count": 48860966,
          "failed": 106,
          "current": 0,
          "time_in_millis": 77806
        },
        "set": {
          "count": 14048879,
          "failed": 820423,
          "current": 0,
          "time_in_millis": 32166
        },
        "set_security_user": {
          "count": 9324860,
          "failed": 0,
          "current": 0,
          "time_in_millis": 76546
        },
        "trim": {
          "count": 1881664,
          "failed": 0,
          "current": 0,
          "time_in_millis": 2458
        },
        "uppercase": {
          "count": 2063566,
          "failed": 0,
          "current": 0,
          "time_in_millis": 2293
        },
        "urldecode": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "user_agent": {
          "count": 65795,
          "failed": 0,
          "current": 0,
          "time_in_millis": 283
        }
      }
    },
    "indexing_pressure": {
      "memory": {
        "current": {
          "combined_coordinating_and_primary_in_bytes": 0,
          "coordinating_in_bytes": 0,
          "primary_in_bytes": 0,
          "replica_in_bytes": 0,
          "all_in_bytes": 0
        },
        "total": {
          "combined_coordinating_and_primary_in_bytes": 0,
          "coordinating_in_bytes": 0,
          "primary_in_bytes": 0,
          "replica_in_bytes": 0,
          "all_in_bytes": 0,
          "coordinating_rejections": 0,
          "primary_rejections": 0,
          "replica_rejections": 0
        },
        "limit_in_bytes": 0
      }
    }
  }
}

Can you increase the heap at all? That might help it get back up and running.
What do the Elasticsearch logs show?

Hello Mark, I think I found the problem after changing the heap size and restarting elasticsearch it shows an error of a

[2022-12-21T07:49:39,576][WARN ][o.e.c.r.a.AllocationService] [slave1] failing shard [FailedShard[routingEntry=[.ds-metrics-system.network-system-2022.09.16-000002][0], node[4kvPwhx_TC23UtnGO7eeiA], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=8JfjOtMJTAK90xeKp7vEhw], unassigned_info[[reason=CLUSTER_RECOVERED], at[2022-12-21T12:48:50.517Z], delayed=false, allocation_status[fetching_shard_data]], message=shard failure, reason [failed to recover from translog], failure=[.ds-metrics-system.network-system-2022.09.16-000002/BMvmNpasTDCdDb_-9lN0ug][[.ds-metrics-system.network-system-2022.09.16-000002][0]] org.elasticsearch.index.engine.EngineException: failed to recover from translog, markAsStale=true]]
org.elasticsearch.index.engine.EngineException: failed to recover from translog
        at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:495) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:468) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:113) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1897) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:463) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:90) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:462) ~[elasticsearch-8.4.3.jar:?]        at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:88) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:2241) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:769) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.4.3.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/var/lib/elasticsearch/indices/BMvmNpasTDCdDb_-9lN0ug/0/translog/translog-647.tlog] is corrupted, checksum verification failed - expected: 0xd1b605cf, got: 0x0
        at org.elasticsearch.index.translog.Translog.verifyChecksum(Translog.java:1560) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.translog.Translog.readOperation(Translog.java:1593) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.translog.BaseTranslogReader.read(BaseTranslogReader.java:110) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:71) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1043) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1837) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.shard.IndexShard.lambda$openEngineAndRecoverFromTranslog$13(IndexShard.java:1888) ~[elasticsearch-8.4.3.jar:?]
        at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:493) ~[elasticsearch-8.4.3.jar:?]
        ... 14 more
[2022-12-21T07:49:47,907][WARN ][o.e.x.s.a.ApiKeyAuthenticator] [slave1] Authentication using apikey failed - unable to find apikey with id Y0BSQIIBsXhy0CK40VOC
[2022-12-21T07:49:52,079][WARN ][o.e.x.s.a.ApiKeyAuthenticator] [slave1] Authentication using apikey failed - unable to find apikey with id Y0BSQIIBsXhy0CK40VOC

so I am starting to think that whenever restoring the backup to a new machine it became corrupted and resulted in that holding kibana up. Also, I still have no idea why it mentions API key as I never set one up, I know I should for security reasons and I promise that is on the list, but I just never did it.

Do you know of a way to fix this corruption Mark?

Also, sorry for the late reply. Where I am at was night time so I was already asleep at that point as I work an earlier shift.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.