Elasticsearch indexing rate from Logstash

SebScoFr · May 31, 2018, 4:01pm

So let me try to provide as much context as possible.

We are using ECE. Both ES and Logstash version 6.2.4.
We have a cluster of 10 nodes, 16GB memory per node.
Everyday we have a process responsible for indexing a bunch of documents (~5M) into ES by sending them to logstash (we use pubsub input / es output).
Our Logstash instance is configured with 16GB of RAM, with -Xms8g -Xmx8g

logstash.yml has the following:
pipeline.workers: 40
pipeline.batch.size: 5000

Finally the logstash pipeline looks like this:

input {
  google_pubsub {
    project_id => "my-project-id"
    topic => "logstash"
    subscription => "logstash"
    max_messages => 5000
  }
}
filter {
  json {
    source => "message"
  }
  mutate {
    remove_field => ["message", "@version", "@timestamp"]
  }
}
output {
  elasticsearch {
    hosts => ["my-es-host"]
    user => "elastic"
    password => "password"
    index => "my-index"
    document_id => "%{[@metadata][uuid]}"
    action => "update"
    scripted_upsert => true
    script => "import"
    script_type => "indexed"
    script_lang => ""
  }
}

So basically we are seeing a fairly low indexing rate. Check out the graph below. On the left side is the rate at which messages are posted to logstash, on the right side are messages remaining to be processed by logstash over time.

According to the following monitoring graph, the indexing rate seems to plateau at around 1.5k/s, with a few spikes here and there but they never last.

Finally during this import process, CPU and memory usage are nowhere near capacity.

Which makes me feel like the ingestion rate could potentially be much faster. I'm trying to figure out what the bottleneck is. Any thoughts?

dadoonet · May 31, 2018, 4:21pm

Wondering what you can see in the Logstash Monitoring part...
You should be able to see how LS itself is performing IMO.

I'm moving your question to #logstash

SebScoFr · June 1, 2018, 3:19pm

Hi @dadoonet, thanks for the reply.

Here is a comparison of Logstash metrics and Elasticsearch indexing rate when the process is running

Those graphs match perfectly which leads me to believe that Logstash is indeed the bottleneck here. Any idea why the Events Received Rate would be so low?
Logstash CPU usage is very low (5%) and heap is also fine.

SebScoFr · June 3, 2018, 6:56pm

Hi @dadoonet, I think you can move that back to the ES section.

I was wrong in thinking that Logstash was the issue. I configured my pipeline to send the output to stdout (for testing purpose) and now the LS monitoring looks like this.

Receive Rate is now 5 times higher which makes it obvious that ES is actually my bottleneck here.

SebScoFr · June 4, 2018, 3:15pm

After further investigation, I really suspect it must have to do with the fact that I'm using a scripted_upsert

output {
  elasticsearch {
    hosts => ["my-es-host"]
    user => "elastic"
    password => "password"
    index => "my-index"
    document_id => "%{[@metadata][uuid]}"
    action => "update"
    scripted_upsert => true
    script => "import"
    script_type => "indexed"
    script_lang => ""
  }
}

I did some tests with a normal index (not scripted upsert) instead and it was much faster.
Is there a limit on how many scripts ES can execute on a given period?

The script itself looks like this, nothing too funky

POST _scripts/import
{
  "script": {
    "lang": "painless",
    "source": "if (ctx._source.created == null) { ctx._source.created = params.event.get('updated'); }"
  }
}

The reason I'm using a script is to be able to keep track of the created timestamp while doing an upsert.

SebScoFr · June 9, 2018, 1:03pm

Would anyone be able to confirm that a scripted_upsert is indeed much slower than an index or even a regular upsert?

system · July 7, 2018, 1:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch/Logstash low indexing rate Elasticsearch	6	451	July 13, 2020
How to tune Logstash for ES indexing speed Logstash	3	1410	October 4, 2018
Finding bottleneck in pipeline Logstash	9	1530	March 1, 2022
Logstash performance for indexing to ES Logstash	6	639	April 18, 2017
Indexing rate, Logstash to Elasticsearch Elasticsearch	2	644	August 7, 2017

Elasticsearch indexing rate from Logstash

Related topics