Hello,
I'm a new user to the whole ELK stack (< 1 week) and let me say I am impressed with just what this tool is capable of.
Currently in my proof of concept vagrant machine, I have a ELK up and running, 2 vcpu's, 8gb ram with packetbeat, topbeat running and generating some what would be useful data if we scaled it up and rolled it out into production.
Now, let me skip ahead to where I want to be with it:
Ideally, we want to be able to scale this out to monitor at least 150 Windows, Linux servers. We want to be able to monitor upwards of 75 Oracle Databases (alert log, tablespace, running sql, blocking sessions) and 25 MSSQL Databases. (Let's ignore the Microsoft ones for now, I/my team don't look after them) We want to be able to gather SAN metrics (work in progress from one of my colleagues - coming together pretty well, but will feed it into elastic afterwards). And to top it off (for now), we also want to monitor our Oracle Business Intelligence and Fusion Middleware stacks.
So let's start with the simple questions:
1 - For an environment "that big" what would my ELK cluster look in regards to size? This way, I can sell the goal if we proceed.
2 - I'm having some issues with Logstash and monitoring a single Oracle database. Logstash takes 30 seconds to
"warm up" before generating data, for a further 45 seconds before promptly imploding. Error that I've captured is:
Attempted to send a bulk request to Elasticsearch configured at '["http://localhost:9200/"]', but an error occurred and it failed! Are you sure you can reach elasticsearch from this machine using the configuration provided? {:error_message=>"Cannot serialize instance of: Sequel::SQL::Blob",
This is just a starting point, I'm sure there will be more questions I'll come up with as I continue to learn more about it.
Cheers.