We have a 2-master, 5-slave elasticsearch cluster collecting logs from a ton of different microservice servers. Although indexing has never been a problem, occasionally, our kibana goes down due extremely long timeouts. Sometimes I have been able to track these problems back to EXTREMELY large individual documents ruining query times. Typically these have been the result of a faulty multiline filter.
However here is my problem - sometimes when we get these time-out issues, I don't know how to identify what server is producing the massive logs because we have so many. Since most of our logs go into the same, daily index, is there any way to identify based on source (we have a "source" field in our logs) or something else which server is producing the problem logs that are freezing our queries??
Any help on this would be massively appreciated!!