This is more broad than Elasticsearch, but I don't see a category for questions about the whole stack.
I need to setup a central logging cluster in a relatively small environment. We have about 500 servers that primarily serve 3-tier web applications, and file shares. Our servers are pretty evenly split between Linux (mostly RHEL) and Windows (mostly Server '16).
My requirements are fairly simple:
Ship security logs to QRadar, controlled by InfoSec, in syslog formatting. (I have no influence over QRadar, so even if it can accept other message types, it's not gonna happen.)
Ship security and system logs to our department's logging cluster, and store them for a minimum of 90 days (180 or more would be preferred, though). Including application logs is an option, but not a requirement.
I have a general plan in mind, but I'm not sure it's the most efficient way of going about this, and I'd like to get some advice.
Windows servers will send events to a central event collector that will then send to logstash to convert the messages to syslog for QRadar, and also output to Elasticsearch. I assume Winlogbeat is the best option there.
For our Linux environment, I wanted to use auditbeat to ship security information to logstash, so I could then forward to both QRadar and Elasticsearch. I couldn't get auditbeat to log locally and send to logstash, and I couldn't get auditd and auditbeat to run together. I read a message from one of the devs on Github saying that running them together wasn't the intended use, anyway, so that's fine, and RHEL 7 uses kernel 3.10.*, so I guess that's expected.
My current plan is to send auth,authpriv, and whatever else I need to send to QRadar through rsyslog, and maybe use filebeat for sending to Elasticsearch, which brings me to my questions.
What are the benefits to using Filebeat instead of sending rsyslog to ingest nodes to do the parsing (besides reduced computational overhead on the logging cluster)?
I'm extremely confused about the index and shard allocations. If I use Filebeat to send to Elasticsearch, do I need to configure the Elasticsearch templates in Filebeat and Elasticsearch, or can I manage that entirely from Elasticsearch? (Sorry if this is answered in the docs, but I've read through a sizeable chunk of the documentation for several components, and I've found information on how they work together to be a bit sparse)
Given the described environment (primarily 3-tier web services, and file shares), what kind of sizing requirements might I be looking at?
I know that's a really difficult question to answer, but from what I've seen, the requirements might be a lot higher than what we've anticipated.
I'm actually rebuilding the cluster because it was not functioning correctly after the guy I replaced built it. I thought it'd be better to start with a fresh build since I've never worked with this before, and we didn't need the data. The point is, we had a bunch of servers pointed at the three servers I'm using, and when I turned logstash on and wrote to a file, it grew to around 40G in about an hour.
Does Elasticsearch somehow utilize space better than Logstash, or is it reasonable to expect a similar data flow in both applications?