Elasticsearch and Smile encoded JSON

Drew_Kutcharian · May 29, 2014, 5:07am

Hey Guys,

I wanted to get some clarification on how Elasticsearch handles/uses Smile binary JSON. Mainly:

Does ES convert JSON to Smile before saving into Lucene?
Does ES use Smile as the wire protocol for the Java Client?
If I wanted to have everything in Smile format (What's stored in Lucene, fieldata, and the communication between server and client) how should I do it? Should I just set the "source" to Smile byte array using the Java Client?

Note that I use the Java Client and don't really use the REST API, except for debugging.

Best,

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4D73E3EC-83A6-459F-AB2B-F6540D5BE3BD%40venarc.com.
For more options, visit https://groups.google.com/d/optout.

brian_yoder · May 29, 2014, 7:56pm

Drew,

This may not help you, but it's based on my own experience.

Using the Java API (I also don't use the REST API except for exploration
and problem reporting), I just use the JSON string as the source for every
document.

For one of my applications (more generic), I have my own generic Record
object that stores a mapping of field name to one or more field values. I
then use the JSON stream parser to set it, and the XContentBuilder to
generate it. Very quick, and very generic.

For even faster processing, I include the Jackson 2.0 libraries and then
use the Data Binding model to serialize a Java object to JSON and
deserialize back into a Java object. This is not as generic, but it's
application-specific and easy to adapt to enhancements or new applications.
To measure the performance, I created a test driver that performed the
following 4 steps:

Serialize a moderately complex Java object into JSON.
Publish the JSON string to an LMAX Disruptor ring buffer.
Consume the JSON string from the LMAX Disruptor ring buffer.
Deserialize the JSON string back into a Java object.

Steps 1-4, inclusive were performed on two threads on my i7 MacBook at 2
million per second. So I have no worries at all about performance when
using JSON! Therefore, I am smiling already and have never felt the need to
SMILE more.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b0da6e8f-acbf-4961-849c-ec7bb0d99287%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · May 29, 2014, 9:52pm

No (the cluster state of ES - not part of Lucene - is saved to disk in
SMILE format)
No.
Yes, you can use SMILE on XContentBuilder classes. The result can
transported to the cluster, the decoding of SMILE is done transparently.

Because the transport is LZF compressed by default, you should consider if
disabling it for SMILE is worth it. SMILE is a compressed JSON technique
but I don't have numbers if there is any advantage about plain JSON with
LZF compressed (I doubt that SMILE is better)

Also note, there is CBOR in latest ES releases, which seems superior to
SMILE in many aspects (compactness, speed, standardization in RFC 7049)

Jörg

On Thu, May 29, 2014 at 7:07 AM, Drew Kutcharian drew@venarc.com wrote:

Hey Guys,

I wanted to get some clarification on how Elasticsearch handles/uses Smile
binary JSON. Mainly:

Does ES convert JSON to Smile before saving into Lucene?

Does ES use Smile as the wire protocol for the Java Client?

If I wanted to have everything in Smile format (What's stored in
Lucene, fieldata, and the communication between server and client) how
should I do it? Should I just set the "source" to Smile byte array using
the Java Client?

Note that I use the Java Client and don't really use the REST API, except
for debugging.

Best,

Drew

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4D73E3EC-83A6-459F-AB2B-F6540D5BE3BD%40venarc.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGUzf_bz_LgHZzhrMDMf1AZ5t7eZ_HKAYrSMpm3%3D4hZMg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Drew_Kutcharian · May 30, 2014, 4:18am

Hi Jörg,

Thanks for the comments. If I understand you correctly, you're saying that if I use SMILE and/or CBOR, the communication/storage won't be compressed using LZF?

Drew

On May 29, 2014, at 2:52 PM, joergprante@gmail.com wrote:

No (the cluster state of ES - not part of Lucene - is saved to disk in SMILE format)

No.

Yes, you can use SMILE on XContentBuilder classes. The result can transported to the cluster, the decoding of SMILE is done transparently.

Because the transport is LZF compressed by default, you should consider if disabling it for SMILE is worth it. SMILE is a compressed JSON technique but I don't have numbers if there is any advantage about plain JSON with LZF compressed (I doubt that SMILE is better)

Also note, there is CBOR in latest ES releases, which seems superior to SMILE in many aspects (compactness, speed, standardization in RFC 7049)

Add CBOR data format support by kzwang · Pull Request #5509 · elastic/elasticsearch · GitHub

Jörg

On Thu, May 29, 2014 at 7:07 AM, Drew Kutcharian drew@venarc.com wrote:
Hey Guys,

I wanted to get some clarification on how Elasticsearch handles/uses Smile binary JSON. Mainly:

Does ES convert JSON to Smile before saving into Lucene?

Does ES use Smile as the wire protocol for the Java Client?

If I wanted to have everything in Smile format (What's stored in Lucene, fieldata, and the communication between server and client) how should I do it? Should I just set the "source" to Smile byte array using the Java Client?

Note that I use the Java Client and don't really use the REST API, except for debugging.

Best,

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4D73E3EC-83A6-459F-AB2B-F6540D5BE3BD%40venarc.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGUzf_bz_LgHZzhrMDMf1AZ5t7eZ_HKAYrSMpm3%3D4hZMg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6C7EC494-EE35-4158-9127-FE590D64C6D7%40venarc.com.
For more options, visit https://groups.google.com/d/optout.

jprante · May 30, 2014, 9:47am

LZF compression is always enabled on the transport layer.

http://wiki.fasterxml.com/SmileFormatSpec describes that although
compression within SMILE is possible there is no compression scheme
included in SMILE.

My idea is to disable compression for SMILE / CBOR - both come with a
serialization overhead with reasonable "compact" binary results, and
compression may not work as well as on plain JSON.

Jörg

On Fri, May 30, 2014 at 6:18 AM, Drew Kutcharian drew@venarc.com wrote:

Hi Jörg,

Thanks for the comments. If I understand you correctly, you’re saying that
if I use SMILE and/or CBOR, the communication/storage won’t be compressed
using LZF?

Drew

On May 29, 2014, at 2:52 PM, joergprante@gmail.com wrote:

No (the cluster state of ES - not part of Lucene - is saved to disk in
SMILE format)

No.

Yes, you can use SMILE on XContentBuilder classes. The result can
transported to the cluster, the decoding of SMILE is done transparently.

Because the transport is LZF compressed by default, you should consider if
disabling it for SMILE is worth it. SMILE is a compressed JSON technique
but I don't have numbers if there is any advantage about plain JSON with
LZF compressed (I doubt that SMILE is better)

Also note, there is CBOR in latest ES releases, which seems superior to
SMILE in many aspects (compactness, speed, standardization in RFC 7049)

https://github.com/elasticsearch/elasticsearch/pull/5509

Jörg

On Thu, May 29, 2014 at 7:07 AM, Drew Kutcharian drew@venarc.com wrote:

Hey Guys,

I wanted to get some clarification on how Elasticsearch handles/uses
Smile binary JSON. Mainly:

Does ES convert JSON to Smile before saving into Lucene?

Does ES use Smile as the wire protocol for the Java Client?

If I wanted to have everything in Smile format (What's stored in
Lucene, fieldata, and the communication between server and client) how
should I do it? Should I just set the "source" to Smile byte array using
the Java Client?

Note that I use the Java Client and don't really use the REST API, except
for debugging.

Best,

Drew

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4D73E3EC-83A6-459F-AB2B-F6540D5BE3BD%40venarc.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGUzf_bz_LgHZzhrMDMf1AZ5t7eZ_HKAYrSMpm3%3D4hZMg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGUzf_bz_LgHZzhrMDMf1AZ5t7eZ_HKAYrSMpm3%3D4hZMg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6C7EC494-EE35-4158-9127-FE590D64C6D7%40venarc.com
https://groups.google.com/d/msgid/elasticsearch/6C7EC494-EE35-4158-9127-FE590D64C6D7%40venarc.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFsnGhx_Lm9t%3D3EHH7eR424tKbOAq9qn76GJDtEAmmYYw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Benefits of storing JSON documents in binary Smile format Elasticsearch	4	2507	July 6, 2017
Index/Search API's supported content type Elasticsearch	1	634	September 24, 2018
Json Binary Format Elasticsearch	12	1305	July 6, 2017
Smile parsing bug with ES 0.16.0 Elasticsearch	6	352	July 6, 2017
Elastic search data format Elasticsearch	6	987	July 5, 2017

Elasticsearch and Smile encoded JSON

Related topics