Troubleshooting resources?

Hi folks,
I have a 8 node cluster with 1 logstash server collecting winlogbeats data. It has been running fine until recently. I added more endpoints and I noticed in Kibana that all the events stopped at the same time.

What is the best way of finding the errors/cause of this? I've ran into something similar before were the index size of daily was too small so I set it to weekly. Deleting the data and restarting caused data to flow.

I'm reviewing logs in /var/log/elasticsearch but haven't found indication of an error yet. Any tips or pointers? My troubleshooting skills regarding elasticsearch are weak.

Don't forget the Kibana and Logstash logs, too. In fact, I'd start at Kibana, then go to Logstash, and then check the Elasticsearch nodes logs, because if everything stopped at once you should look for the common failure point. Note that you may not have a /var/log/kibana unless it's specifically set up in the kibana.yml, so you may need to add that first and restart Kibana.

thanks for the tip on logstash, found one error but that didn't resolve it. Found one more in elasticsearch but haven't had any luck yet digging up the solution.

[2021-01-05T19:18:19,533][ERROR][o.e.x.s.a.s.m.NativeRoleMappingStore] [usta-elastic-01] failed to load role mappings from index [.security] skipping all mappings.
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

Do you have Monitoring enabled?

yes I do!

Does it show anything at the time you see the other issue?

What do your Elasticsearch, Logstash, Beats logs show?

I ended up finding out it was a combination of permissions being incorrect and lack of storage size on nodes (watermark kicked in). Thank you for all who replied. Data flows as expected.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.