Index / update performance with elasticsearch Java plugin?


#1

Hi,

in our project we use a Java Plugin to be able to merge exisiting documents with new ones. Now I saw that the whole index performance is getting really slow.
It seems that the performance trip traces back to the call of the Java Plugin itself, but not the its content.

Here is an example of an almost empty plugin:

public class UpsertScriptFactory2 implements NativeScriptFactory {

  @Override
  public ExecutableScript newScript(@Nullable Map<String, Object> params) {
    return new UpsertScriptFactory2.CustomScript();
  }

  @Override
  public boolean needsScores() {
    return false;
  }

  @Override
  public String getName() {
    return "upsert_script2";
  }

  private class CustomScript extends AbstractExecutableScript {

    @Override
    public Object run() {
      System.out.println("a");
      return null;
    }
  }
}

and here is the corresponding logstash config:

output {
      elasticsearch {
         action => "update"
         user => "logstash"
         password => "mypw"
         hosts => ["localhost:9200"]
         index => "myindex"
         document_type => "mytype"
         document_id => "%{[uuid]}"
         script => "upsert_script2"
         doc_as_upsert => true
         script_lang => "native"
         script_type => "inline"
         script_var_name => "source"
         idle_flush_time => 0.3
         retry_on_conflict => 10
      }
}

The effect is that on my local machine only about 700 documents are inserted / updated.

Without calling the "upsert_script2" Plugin it is really fast (about 2000 documents per second).

Has anyone an idea what we could do to boost the performance?

Thanks and regards


(Alexander Reelsen) #2

Hey,

can you remove the println to have a full noop-like comparison between, when your script is invoked and when not - and then show those numbers?

--Alex


#3

Yes of course. Without the println I have about 1400 documents in 1 Minute:

When I change the logstash config to

output {
      elasticsearch {
         action => "index"
         user => "logstash"
         password => "changeme"
         hosts => ["localhost:9200"]
         index => "myindex"
         document_type => "%{[monitoring_type]}"
         document_id => "%{[uuid]}"
         idle_flush_time => 0.3
         retry_on_conflict => 10
      }
}

(insert instead of update)

after 20 seconds more than 12000 documents are inserted:


(Alexander Reelsen) #4

So, here are two performance limiting factors at play. First, when running an update query, Elasticsearch has to get the document source first (from disk), then call the script to apply the changes and then store the document, where as the insert just has to store the document. That said, scripts will have a notable performance impact. I do not have any concrete numbers though.

So, you compared:

  • pure insertion of documents: 600/s
  • without calling the script, but still using update: 2000/s
  • script update: 23/s

These numbers dont add up somehow. Did I get them wrong?


#5

The pure insertion is very, very fast (as expected).
The updates are very slow, even without my plugin.

output {
elasticsearch {
action => "update"
user => "logstash"
password => "changeme"
hosts => ["localhost:9200"]
index => "myindex"
document_type => "%{[monitoring_type]}"
document_id => "%{[uuid]}"
idle_flush_time => 0.3
retry_on_conflict => 10
}
}

also is very slow.

Do you have suggestions for some parameters, e.g. pipeline.batch.size?


(Alexander Reelsen) #6

Hey,

I'd put logstash out of the equation first and run some benchmarks against ES only. Just to nail down the culprit.

Next step would be to debug down where the performance impact is (scripting infra, etc).

Also, is it possible to do some of the scripting client/side or in logstash to save some performance on the ES side?

--Alex


#7

I tested a little more and found out that the Java plugin is not responsible for the loss of performance!
The problem really is the action="action"

Summary:
action = "index" is really fast, about 100'000 documents in 20 seconds are handled

input {
    file {
        codec => json {
            charset => "UTF-8"
        }
        path => "/C:/tmp/data.log"
        start_position => beginning
    }
}
 
output {
    elasticsearch {
        action => "index"
        index => "mydata"
        document_id => "%{[uuid]}"
    }
}

action = "update" is really slow (Java Plugin does not have an impact), only about 4000 documents / minute are handled

input {
    file {
        codec => json {
            charset => "UTF-8"
        }
        path => "/C:/tmp/data.log"
        start_position => beginning
    }
}
 
output {
    elasticsearch {
        action => "update"
        doc_as_upsert => true
        #script => "myJavaPluginScript"
        #script_lang => "native"
        #script_type => "inline"
        #script_var_name => "source"
        index => "mydata"
        document_id => "%{[uuid]}"
    }
}

Could you please verify this in your environment?
Of course an "update" opertation is not cheap, but I think this behavior really is too slow.

Thanks and regards


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.