I am a self-funded student, and I need to use this system as part of my final year perusing masters in software and systems security. I wanted assistance calculating systems requirements if the load on the system remains consistent. I am using Elasticsearch stack on this system. I am facing the following issues:
System load is always over 4.0 is this due to IOPs issue? If so, will adding RAM during migration to another system help? Or does this require faster storage such as SSD?
Will providing the following configuration help situation load and improve performance without over provisioning?
Compute: 3 processors x 2 cores per processor = 6
16 GB RAM
Storage will be local (SATA, 7200 RPM NAS grade drive.)
Underlying operating system continue to be Ubuntu 20.04 LTS. I want to comprehensively learn the Elastic Stack and hence I am hoping to put multiple nodes to test roles, query scheduling, etc. for using Elastic stack as an enterprise SIEM.
Current primary purpose of the system is to ingest logs from 50 honeypots deployed around the world in AWS and Azure with 500 being average EPS (events per second.) These events go into a logstash pipeline running on a raspberry pi before being sent the elastic stack.
What steps should I take to ensure I am not over provisioning and able to use the system for other projects too?
Any reason not to run this on Docker, as then way more flexible to control versions, allocations, etc. 4GB not a lot for a real ELK stack, though your 64GB workstation is nice.
IOWait that high on 50 IOPS means small IO and maybe fsyncs() going on; certainly SSD will fix that.
Yeah a little confused on your goals as if you are starting out you have a ways to go before having Enterprise SIEM, and 500 EPS is not high, but could be for spinning disks; hard to know as they are so rare these days on primary data stores and all you can do it test & benchmark.
So my goal is to carry out my final year research on security monitoring. I've selected ELK as one of the platforms to use for benchmark and comparison to traditional SIEMs such as IBMs QRadar.
As part of my research I have series of honeypot systems deployed on AWS/Azure/Google cloud. These systems will be sending telemetry data. Currently I have 25+ such systems including few laptops of my family which also log netflow data. I hope this helps clear my usecase. I apologise for the ambiguity due to my initial post.
Calc EPS by running cluster stats some seconds apart and get indices.docs.count and then divide it by the time. Our ELKman.io tool (which you can use for free) also shows this to you on our dashboard.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.