Hey,
we are using the ELK stack for some detecting/alerting in incident response cases. Some time ago a college created a python framework to automate the creation and configuration of the complete ELK stack + RabbitMQ within a VMWare ESXI environment. The script works fine so far.
Since the script uses "old" versions (ES 1.4) and we are not happy with the indexing speed I am thinking about to update all the components to a newer version to make the index faster. A part of this "improvement" is or might be new/additional hardware.
During my tests of each component I have found a "strange" circumstance with Logstash in our virtual environment. It looks like we are "loosing" a lot of performance due our virtual environment.
My setup
Hardware
First Logstash System, virtual:
- ESXI 6 host (newest version) with 40 Cores (4x10x2.20GHz), 384 GB ram, 24x10k 900GB HDD in Raid10 (default case) or 1x Samsung SM863 960GB (test case).
- Only one Logstash VM installed on the host. 16 Cores, 64 GB RAM.
Second Logstash System, physical: - 4 Cores, 32 GB RAM, 4x7.2k 4TB HDD in Raid 10
Software
Ubuntu 14.04 LTS (newest version/updates), default installation
Logstash 2.2.4, default installation via repo
Testlogs: 44,405,227 logrows in 187 files, filesize ~18MB, 4 columns (Date,Time,Url,Servername)
very simple Logstash metric filter:
input {
file {
path => "/root/logs/*"
type => "lg224_es24"
ignore_older => "0"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
if [type] == "lg224_es24" {
metrics {
meter => "events"
add_tag => "metric"
}
}
}
output {
if "metric" in [tags] {
stdout {
codec => line {
format => "rate: %{[events][rate_1m]}"
}
}
}
}
Results
First system (RAID10 and SSD are kinda equal):
# /opt/logstash/bin/logstash -f /etc/logstash/speedtest3.conf
Settings: Default pipeline workers: 16
Logstash startup completed
rate: 0.0
rate: 8684.2
rate: 9087.192141385285
rate: 9159.296716612702
rate: 9374.465454928883
rate: 9561.988051370114
rate: 9690.365694600969
rate: 9649.479151150666
rate: 9781.111698328365
rate: 9806.128865644157
rate: 9938.029286860687
rate: 10168.842729060667
rate: 10372.390241870278
rate: 10470.384587517554
rate: 10423.70774909221
rate: 10398.72100908067
rate: 10628.29580156215
rate: 10727.449058668883
rate: 10796.670681968226
rate: 10824.937325537152
rate: 10682.077696770191
rate: 10646.235391116312
rate: 10519.598905288813
Peak is at 12k.
Second system:
$ /opt/logstash/bin/logstash -f speedtest3.conf
Settings: Default pipeline workers: 4
Logstash startup completed
rate: 0.0
rate: 10867.2
rate: 11946.53643803584
rate: 13039.24653207971
rate: 14058.580578353722
rate: 14988.993295947235
rate: 15814.679170979603
rate: 16598.541408674624
rate: 17323.103608007415
rate: 17961.74855767535
rate: 18546.212008719103
rate: 19067.265607298563
rate: 19540.549453671785
rate: 19988.928426774728
rate: 20397.77903968827
rate: 20760.79506428221
rate: 21115.542400093032
rate: 21442.1815628836
rate: 21729.495437505026
rate: 21997.003204276654
rate: 22248.159432843207
rate: 22477.25141961804
rate: 22682.31739369109
Peak is at 26k.
I knew physical systems are "always" faster, but not twice like in this case. The attached Logstash results slowing down the whole stack.
Does any one of you have an idea if thats "normal" or what i might change accelerate the virtual environment?
Thank you
Andreas