Looking for suggestions on tuning

Hello all,

I'm reaching out to the community to get some suggestions/opinions on how to best tune a small cluster that I have built on my home lab. I do not do a lot of querying of the data, mainly just indexing. I would prefer to have faster index rates than what I currently have.

Machine specs:
2x Xeon 5675 hex core
72 gigs ram
5x SSD

Currently all 4 nodes are LXC containers on a Promox server. Each node is on their own separate SSD disk, all nodes sharing 24 cores, and 16 gigs of ram for each node.
Node 1 -> ES, LS, KI
Node 2 -> ES
Node 3 -> ES
Node 4 -> ES

I seem to be hitting a bottleneck around 1.5k index /s. I haven't quite been able to pinpoint where the bottleneck is. I wouldn't assume it's disk IO as with having 4 nodes, all on their own SSD. I would believe that they are capable of speeds faster than 1.5k/s as a cluster. CPU usage is barely above 20% for the entire system. Do you believe that I should be getting more than 1.5k? Or are my rates on par with what resources I currently have?

All indexes have had their index refresh set to 30s:

curl -XPUT localhost:9200/_settings -d '{
    "index" : {
        "refresh_interval" : "30s"
    } }'

Nodes 1-4 elasticsearch.yml

# cat /etc/elasticsearch/elasticsearch.yml | egrep -v "(^#.*|^$)"
cluster.name: Cluster1
node.name: ${HOSTNAME}
network.host: [_site_, _local_]
discovery.zen.ping.unicast.hosts: ["xxx.xxx.xxx.200", "xxx.xxx.xxx.201", "xxx.xxx.xxx.202", "xxx.xxx.xxx.203"]
indices.memory.index_buffer_size: 30%
indices.fielddata.cache.size:  10%

Node 1 logstash/conf.d/01-file.conf

input {
  beats {
      port => 5045
          ssl => true
          ssl_certificate => "/etc/ssl/logstash-forwarder.crt"
          ssl_key => "/etc/ssl/logstash-forwarder.key"
    }
}
filter {
    if [type] == "cowrie" {
        json {
            source => message
        }
        date {
            match => [ "timestamp", "ISO8601" ]
        }
        if [src_ip]  {
            dns {
                reverse => [ "src_host", "src_ip" ]
                action => "append"
            }
            geoip {
                source => "src_ip"  # With the src_ip field
                target => "geoip"   # Add the geoip one
                database => "/opt/logstash/vendor/geoip/GeoLite2-City.mmdb"
            }
        }
    }
}
output {
    if [type] == "cowrie" {
        # Output to elasticsearch
        elasticsearch {
           hosts => ["xxx.xxx.xxx.200:9200","xxx.xxx.xxx.201:9200","xxx.xxx.xxx:9200","xxx.xxx.xxx.203:9200"]
           sniffing => true
           manage_template => false
           index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
           document_type => "%{[@metadata][type]}"
        }
        # For debugging
        stdout {
            codec => rubydebug
        }
    }
}

It is hard to comment whether this ingestion rate is high or not as it depends on many factors like the complexity of your documents and the mappings for instance, that are typically different for most use-cases.

Maybe look at https://www.elastic.co/guide/en/elasticsearch/reference/current/general-recommendations.html and https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.