Logstash pipeline using elasticsearch filter experiencing low performance

Hi there,

I'm probably having some infrastructural problem, but I sure can use some help in finding out what I need to look at and fix.

I'm trying to 'enrich' data from one index with data from another index using logstash. My indexing speed is smaller than 1 d/s, which I think could be better :wink:

I'm using logstash in a docker on a local macbook (which is probably a reason for low performance, but I reckon that even in this setup, 1 d/s is way to slow and can be better).

I'm using a cloud trial:

  • 2 * AWS.data.highio.i3, 4GB
  • 1 * AWS.master.R4 1GB tiebreaker

Indexing from a csv from local machine using the same logstash docker performs at around 200 documents per second. I'm fine with that

This is my pipeline, it actually works, but as mentioned: really sloooow:

input {
  elasticsearch {
   hosts => ["https://xxxxxxx"]
    user => "xxxx"
    password => "xxxx"
    docinfo => true
    query => '{
                "query": {
                  "bool": {
                    "must": [
                      { "exists": { "field": "xxxxx" }}
                    ],
                    "filter": [
                      { "range": {
                        "start_dtime": {
                            "gte": "2019-03-12",
                            "lte": "2019-03-13"
                        }
                      }
                      }
                    ]
                  }
                }
              }'
  }
}

filter {
  mutate {
    remove_field => [ "@version", "host", "message", "path" ]
    add_field => {
      "xxxx_date" => "unknown"
      "xxxxx_status" => "unknown"
      "xxxx_name" => "unknown"
    }
  }
   elasticsearch {
      hosts => ["xxxxxxxx"]
      user => "xxxx"
      password => "xxx"
      index => "xxxxxxxx*"
      query => "xxxxx: %{[xxxxx]}"
      fields => {
        "xxxx_date"    => "xxxx_date"
        "xxxx_status"  => "xxxx_status"
        "xxxx_name"    => "xxxx_name"
      }
   }
}

output {

  elasticsearch {
    hosts => ["xxxxxxxx"]
    index => "%{[@metadata][_index]}"
   action => "update"
    document_id => "%{[@metadata][_id]}"
    user => "xxxx"
    password => "xxxx"
  }
}

My logstash configuration:

pipeline:
  workers: 5
  batch:
    size: 1000
    delay: 50

Index settings

"settings" : {
  "index" : {
    "creation_date" : "1552728048556",
    "number_of_shards" : "1",
    "number_of_replicas" : "1",

Maybe anyone can give me a hint where to look or tweak this so it will be running a bit faster?

Thanks!

Jeroen

Anyone any tips?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.