I wanted to get some clarification on how Elasticsearch handles/uses Smile binary JSON. Mainly:
Does ES convert JSON to Smile before saving into Lucene?
Does ES use Smile as the wire protocol for the Java Client?
If I wanted to have everything in Smile format (What's stored in Lucene, fieldata, and the communication between server and client) how should I do it? Should I just set the "source" to Smile byte array using the Java Client?
Note that I use the Java Client and don't really use the REST API, except for debugging.
This may not help you, but it's based on my own experience.
Using the Java API (I also don't use the REST API except for exploration
and problem reporting), I just use the JSON string as the source for every
document.
For one of my applications (more generic), I have my own generic Record
object that stores a mapping of field name to one or more field values. I
then use the JSON stream parser to set it, and the XContentBuilder to
generate it. Very quick, and very generic.
For even faster processing, I include the Jackson 2.0 libraries and then
use the Data Binding model to serialize a Java object to JSON and
deserialize back into a Java object. This is not as generic, but it's
application-specific and easy to adapt to enhancements or new applications.
To measure the performance, I created a test driver that performed the
following 4 steps:
Serialize a moderately complex Java object into JSON.
Publish the JSON string to an LMAX Disruptor ring buffer.
Consume the JSON string from the LMAX Disruptor ring buffer.
Deserialize the JSON string back into a Java object.
Steps 1-4, inclusive were performed on two threads on my i7 MacBook at 2
million per second. So I have no worries at all about performance when
using JSON! Therefore, I am smiling already and have never felt the need to
SMILE more.
No (the cluster state of ES - not part of Lucene - is saved to disk in
SMILE format)
No.
Yes, you can use SMILE on XContentBuilder classes. The result can
transported to the cluster, the decoding of SMILE is done transparently.
Because the transport is LZF compressed by default, you should consider if
disabling it for SMILE is worth it. SMILE is a compressed JSON technique
but I don't have numbers if there is any advantage about plain JSON with
LZF compressed (I doubt that SMILE is better)
Also note, there is CBOR in latest ES releases, which seems superior to
SMILE in many aspects (compactness, speed, standardization in RFC 7049)
Jörg
On Thu, May 29, 2014 at 7:07 AM, Drew Kutcharian drew@venarc.com wrote:
Hey Guys,
I wanted to get some clarification on how Elasticsearch handles/uses Smile
binary JSON. Mainly:
Does ES convert JSON to Smile before saving into Lucene?
Does ES use Smile as the wire protocol for the Java Client?
If I wanted to have everything in Smile format (What's stored in
Lucene, fieldata, and the communication between server and client) how
should I do it? Should I just set the "source" to Smile byte array using
the Java Client?
Note that I use the Java Client and don't really use the REST API, except
for debugging.
Thanks for the comments. If I understand you correctly, you're saying that if I use SMILE and/or CBOR, the communication/storage won't be compressed using LZF?
No (the cluster state of ES - not part of Lucene - is saved to disk in SMILE format)
No.
Yes, you can use SMILE on XContentBuilder classes. The result can transported to the cluster, the decoding of SMILE is done transparently.
Because the transport is LZF compressed by default, you should consider if disabling it for SMILE is worth it. SMILE is a compressed JSON technique but I don't have numbers if there is any advantage about plain JSON with LZF compressed (I doubt that SMILE is better)
Also note, there is CBOR in latest ES releases, which seems superior to SMILE in many aspects (compactness, speed, standardization in RFC 7049)
On Thu, May 29, 2014 at 7:07 AM, Drew Kutcharian drew@venarc.com wrote:
Hey Guys,
I wanted to get some clarification on how Elasticsearch handles/uses Smile binary JSON. Mainly:
Does ES convert JSON to Smile before saving into Lucene?
Does ES use Smile as the wire protocol for the Java Client?
If I wanted to have everything in Smile format (What's stored in Lucene, fieldata, and the communication between server and client) how should I do it? Should I just set the "source" to Smile byte array using the Java Client?
Note that I use the Java Client and don't really use the REST API, except for debugging.
My idea is to disable compression for SMILE / CBOR - both come with a
serialization overhead with reasonable "compact" binary results, and
compression may not work as well as on plain JSON.
Jörg
On Fri, May 30, 2014 at 6:18 AM, Drew Kutcharian drew@venarc.com wrote:
Hi Jörg,
Thanks for the comments. If I understand you correctly, you’re saying that
if I use SMILE and/or CBOR, the communication/storage won’t be compressed
using LZF?
No (the cluster state of ES - not part of Lucene - is saved to disk in
SMILE format)
No.
Yes, you can use SMILE on XContentBuilder classes. The result can
transported to the cluster, the decoding of SMILE is done transparently.
Because the transport is LZF compressed by default, you should consider if
disabling it for SMILE is worth it. SMILE is a compressed JSON technique
but I don't have numbers if there is any advantage about plain JSON with
LZF compressed (I doubt that SMILE is better)
Also note, there is CBOR in latest ES releases, which seems superior to
SMILE in many aspects (compactness, speed, standardization in RFC 7049)
On Thu, May 29, 2014 at 7:07 AM, Drew Kutcharian drew@venarc.com wrote:
Hey Guys,
I wanted to get some clarification on how Elasticsearch handles/uses
Smile binary JSON. Mainly:
Does ES convert JSON to Smile before saving into Lucene?
Does ES use Smile as the wire protocol for the Java Client?
If I wanted to have everything in Smile format (What's stored in
Lucene, fieldata, and the communication between server and client) how
should I do it? Should I just set the "source" to Smile byte array using
the Java Client?
Note that I use the Java Client and don't really use the REST API, except
for debugging.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.