Update performance - very low indexing rate

matw · October 9, 2017, 7:07am

We're using a redis -> logstash -> elasticsearch pipeline

Our test system is a single instance installation with 8CPU, 12 GB RAM, running on VMware
Currently we can’t separate the components, so no clustering is possible (will change in the future)

Here’s the Logstash redis input

redis { 
    data_type => "list“  
    host => "${REDIS_HOST:127.0.0.1}“  
    key => "import“  
    password => „xxx“  
    threads =>  "2"  
    codec => json {     charset => "ASCII“  }
}

The messages are saved in 2 indexes (2 Outputs in logstash)

single-index collects all messages
summary-index collects special messages, a groovy script is used to create a summary record, that is updated frequently by id (5-x) times

Here's the Logstash output for the single-index

 elasticsearch {
      index => "single-index"
      hosts => ["127.0.0.1"]
    }

Here's the Logstash output for the summary-index

elasticsearch {
            action => "update"
            document_id => "%{uniqueId}"
            index => "summary-index"
            script => "summarize"
            script_lang => "groovy"
            script_type => "file"
            scripted_upsert => true
            retry_on_conflict => 5
            hosts => ["127.0.0.1"]
          }

Originally only a part of the messages were sent to the summary-index

For this scenario, the indexing rate was ok (max about 9000/s )

Now we’ve got data, that is stored in both indexes, and while it’s clear that the performance
can’t be like in the mixed scenario, we didn’t expect the performance numbers we’ve got

Sending 1000 msg/s to redis (20 msg per id -> 50 summary records)

Result:
Only 1600/s indexing rate (2000 would be enough to keep pace). The odd thing is, that the system has a CPU Usage of 50%, Load average of 4, so there seems to be headroom for a higher rate

By deactivating the single-message pipeline:
700/s

By deactivating the groovy script (single-message-pipeline still deactivated):
1000/s

So the question is, how could we improve this performance?
It’s clear that the upserts are the bottleneck.

thx

warkolm · October 9, 2017, 7:28am

What does the script do?

matw · October 9, 2017, 7:34am

it extracts values of the message sent to a summary records (firstMessage, lastMessage, computed state, List of IPs, etc.)

Christian_Dahlqvist · October 9, 2017, 7:40am

Which version of Elasticsearch are you using?

matw · October 9, 2017, 7:41am

sry, forgot to mention, logstash + elasticsearch 5.6.1

Christian_Dahlqvist · October 9, 2017, 7:54am

There was a change that affected certain types of update scenarios for ES 5.X as outlined in the release notes. This was also discussed in this thread. Does this match how you are doing updates?

matw · October 9, 2017, 2:59pm

ok, yes, thanks, so the best way is to solve very frequent updates at application level, right? Didn't find a fitting solution using logstash for this use case (anybody knows how?), working on a POC by using another service.

One message of the thread you posted

As of 5.0.0 the get API will issue a refresh if the requested document has been changed since the last refresh but the change hasn’t been refreshed yet. This will also make all other changes visible immediately. This can have an impact on performance if the same document is updated very frequently using a read modify update pattern since it might create many small segments. This behavior can be disabled by passing realtime=false to the get request.

realtime=false

could that be an option to speed up the current solution (Until i developed a new one), can i add this the the logstash output?

thank you very much

system · November 6, 2017, 2:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch indexing performance below expectations at a single installation Elasticsearch	6	706	August 14, 2018
Indexing performance terrible after upgrading from 1.6 to 2.4 Elasticsearch	2	472	July 5, 2017
Elasticsearch Indexing Rate Elasticsearch	9	3348	July 5, 2017
Redis - Logstash - Elasticsearch is not as fast as I expected Logstash	7	569	July 2, 2019
Issues with logstash sustained throughput Elasticsearch	2	454	July 6, 2017

Update performance - very low indexing rate

Related topics