I know this is a very vague question and might have answered a couple of times before, but wanted another pair of eyes to look at and vet?
I am planning to build log management/Security Analytics solution and will be collecting logs from around 100 devices comprises for [Servers/Routers/Switches/Firewalls] which I believe should not generate more than 15-20 GB per day
I am planning with 4 nodes
1xLogstash - 24GB
2xES nodes [cluster] - 32 GB each/ having 3 TB of Diskspace
1xwazuh/OSSEC node for accepting messages from Server sending it to ES directly with 16Gb RAM
Now my queries are -
How many shards should be configured? 5 is enough?
What should be HEAP_SIZE on ES? Considering 32 GB -> 16GB is enough?
Can I install Kibana on Primary elasticsearch node? Or do I need to install Kibana on a different server?
And Kibana will/should connect to the primary node of the cluster?
Similarly, Logstash will send data to the primary node as well?
Considering future growth and shard numbers I can add more ES node in the cluster; right?
Any other optimization tips are really appreciated
1 sounds enough (and this is the default in recent versions). 5 sounds like too many. Use ILM to move to a new index when the current one reaches a reasonable size (say 40GB).
If your machines have 32GB of RAM then 16GB (50%) is the absolute maximum allowed. You may get better performance with a smaller heap.
There's no such thing as a "primary node". Both nodes are equal from Elasticsearch's point of view, and you can send data and searches to either.
Yes. Note that you cannot build a fault-tolerant cluster with only two nodes - you need at least three master-eligible nodes for resilience. But after that you can add data nodes as needed. You might also want to segregate your data nodes into a hot/warm architecture.
If I am not wrong one shard probably will not spawn on multiple nodes, correct? and considering the future growth if data volume increases; if I introduce 2 mode nodes 1 shard will not be enough, right?
Ok I think I see. By "number of shards" I mean the number set by the index.number_of_shards setting. 1 is the default for that and that sounds reasonable for your case. But for fault tolerance each shard must have a replica, and this is controlled by the independent index.number_of_replicas setting. 1 is the default for that too, meaning that each shard will have a copy on both nodes, and that sounds good for you too.
Ingest node however I am not so sure about the ingest node can start listening on any port for incoming data from my network devices? From servers it would work since I will be installing shippers but what about Network devices?
Packetbeat is also a library. It supports many application layer protocols, from database to key-value stores to HTTP and low-level protocols. Choose the one you need or add your own by submitting a pull request.