in our project we use a Java Plugin to be able to merge exisiting documents with new ones. Now I saw that the whole index performance is getting really slow.
It seems that the performance trip traces back to the call of the Java Plugin itself, but not the its content.
Here is an example of an almost empty plugin:
public class UpsertScriptFactory2 implements NativeScriptFactory {
@Override
public ExecutableScript newScript(@Nullable Map<String, Object> params) {
return new UpsertScriptFactory2.CustomScript();
}
@Override
public boolean needsScores() {
return false;
}
@Override
public String getName() {
return "upsert_script2";
}
private class CustomScript extends AbstractExecutableScript {
@Override
public Object run() {
System.out.println("a");
return null;
}
}
}
So, here are two performance limiting factors at play. First, when running an update query, Elasticsearch has to get the document source first (from disk), then call the script to apply the changes and then store the document, where as the insert just has to store the document. That said, scripts will have a notable performance impact. I do not have any concrete numbers though.
So, you compared:
pure insertion of documents: 600/s
without calling the script, but still using update: 2000/s
script update: 23/s
These numbers dont add up somehow. Did I get them wrong?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.