This is my first post. I have 10GB/day centralized syslog from CentOS systems. Log files can be accessed from local or NFS mount. Need at least two fields(timestamp and host) to filter logs. Total users are less than five.
What kind of setup you would recommend?
Minimum hardware requirement. Multiple ingest/data nodes?
Will logstash be the bottleneck?
How do I scale up if I have more log in the future.
Since I need to filter data using host and timestamp, I assume filebeat cannot be used.
Do I have to use logstash? If so, is grok the only way to add timestamp and host fields?
Thank you for the links.
Let's say I want to set shards size to 25GB and number of shards per GB to 20, but how do I define shards size and number of shards per GB?
If I use default setting - 5 primary shards and 1 replica shard per index, for my time-based daily-indices of centralized logs, should I have
1.. 3 to 5 nodes with default settings(all have node.master/node.data/node.ingest set to true)
2.. 1 master node, two data nodes, two ingest nodes
3.. Other layout, like 1 co-node and 3 master-eligible nodes...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.