How are you measuring it?
Have you load tested?
What JVM? What OS? What hardware?
What's the mapping?
What else is happening in your cluster when you query?
JVM : Java 8
Hardware is same for both the clusters
We aren't using both the cluster for any other queries except for load testing
And yes, we are running a script to capture response of 100 such different queries
Issue becomes deep when we query for 100k docs (size : 100k)
We have a use case, where we need to filter 100k docs
I have tested with both LZ4 and best_compression in es 5 cluster. Performance is bad in both cases
For 100k docs
Es 1.4 : ~2s
ES 5.4 : ~3.5s (LZ4)
ES 5.4 : ~ 5s (best_compression)
100.2% (501ms out of 500ms) cpu usage by thread 'elasticsearch[es-data-nqa-spr_nqa_elasticsearch-5.4-1-7-1c][[transport_server_worker.default]][T#12]'
7/10 snapshots sharing following 52 elements
java.util.zip.Inflater.inflateBytes(Native Method)
java.util.zip.Inflater.inflate(Inflater.java:259)
org.apache.lucene.codecs.compressing.CompressionMode$DeflateDecompressor.decompress(CompressionMode.java:224)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.document(CompressingStoredFieldsReader.java:560)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.document(CompressingStoredFieldsReader.java:576)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:583)
org.apache.lucene.index.CodecReader.document(CodecReader.java:88)
org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:411)
org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:347)
I just need the id from the documents, even when I try with stored_fields : ["id"], it still decompresses the documents
Any insights on how we can surpass this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.