Benefits of storing JSON documents in binary Smile format

Hi all,

I'm very curious to learn others' experience with storing JSON documents in
the binary JSON Smile http://wiki.fasterxml.com/SmileFormatSpec format
within ElasticSearch. For my particular index, initial testing has shown
that storing _source as Smile leads to a 1/8th reduction in index size,
even after _source compression using 0.20.2. A couple of points of
clarifications:

  • Does ES natively store Smile submitted _source as Smile?
  • Does ES natively store Smile submitted fields as Smile?
  • Does ES document its support for Smile? If so, where?
  • When is Smile recommended over JSON?
  • Are there any downsides? (e.g. large CPU tradeoffs)
  • Are others are using Smile successfully to more efficiently store their
    documents?

Related posts:

[1]
https://groups.google.com/a/elasticsearch.com/d/msg/users/xiKhxl4gn-I/ictJ1OjiVfgJ
[2]
https://groups.google.com/a/elasticsearch.com/forum/?fromgroups=#!searchin/users/binary$20json/users/Rfw53vLBNcQ/e0UOpVQQ5QkJ
[3]
https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/Json$20Binary$20Format/elasticsearch/iA1x7IacekI/timLZvXiG50J

Thank you in advance!

Bob

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Well, not much of a response, but 30 views so there must be some interest
besides me :slight_smile:

I'd be very thankful if an ES dev could comment. I tried to ask this
question on the IRC channel with no response.

Thanks again!

Bob

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ES devs seem busy now
https://twitter.com/bleskes/status/327108971615879168/photo/1 :slight_smile:

I'm not an ES dev, but I can read the ES source, so let me try to answer:

  • Does ES natively store Smile submitted _source as Smile?

yes

  • Does ES natively store Smile submitted fields as Smile?

yes

  • Does ES document its support for Smile? If so, where?

In the source :wink: ES uses SMILE internally for cluster metadata,
percolator query metadata, and index warmers metadata.

It's available for Java developers in the XContent submodule, easy to
use, because it's 100% analogeous to JSON content when using the
XContent builder API.

  • When is Smile recommended over JSON?

just check out the design goals
http://wiki.fasterxml.com/SmileFormatDesignGoals

As with all binary compacting efforts, SMILE will help you best when you
have large documents.

If you can embed ES into your app without JSON requirement, you can
happily use SMILE.

  • Are there any downsides? (e.g. large CPU tradeoffs)

There is no free lunch. Higher CPU time competes with shorter network
transmission time. For serving ES clients over the REST HTTP API, they
expect JSON in 99% of all cases, so deserialization will be invoked in
the REST layer. If you are on the Java transport level, not on REST
HTTP, you should just use the XContent API to retrieve the stored
fields. SMILE is autodetected by a magic cookie, and if you have a
target to deserialize the document to, you just "sit" on Jackson SMILE
API with the XContent API. Otherwise, you would have to decode SMILE by
yourself. If you use both JSON and SMILE in your apps, you may have to
switch to and from, which might be a source for confusion, but that is a
design question outside ES.

  • Are others are using Smile successfully to more efficiently store
    their documents?

No, but the only reason is, I have postponed my SMILE tests because they
are simply of lesser priority than other things (data modeling). I got
30 GB index size on disk with 30m docs with 0.90, no reason for action :slight_smile:

Jörg

Am 24.04.13 13:43, schrieb btiernay:

Well, not much of a response, but 30 views so there must be some
interest besides me :slight_smile:

I'd be very thankful if an ES dev could comment. I tried to ask this
question on the IRC channel with no response.

Thanks again!

Bob

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Wow, a great response Jorg. Your insight is much appreciated.

Your point about application level optimization is well taken. That is
obviously the first thing one wants to investigate.

Cheers!

Bob

On Wednesday, 24 April 2013 13:19:23 UTC-4, Jörg Prante wrote:

ES devs seem busy now
https://twitter.com/bleskes/status/327108971615879168/photo/1 :slight_smile:

I'm not an ES dev, but I can read the ES source, so let me try to
answer:

  • Does ES natively store Smile submitted _source as Smile?

yes

  • Does ES natively store Smile submitted fields as Smile?

yes

  • Does ES document its support for Smile? If so, where?

In the source :wink: ES uses SMILE internally for cluster metadata,
percolator query metadata, and index warmers metadata.

It's available for Java developers in the XContent submodule, easy to
use, because it's 100% analogeous to JSON content when using the
XContent builder API.

  • When is Smile recommended over JSON?

just check out the design goals
http://wiki.fasterxml.com/SmileFormatDesignGoals

As with all binary compacting efforts, SMILE will help you best when you
have large documents.

If you can embed ES into your app without JSON requirement, you can
happily use SMILE.

  • Are there any downsides? (e.g. large CPU tradeoffs)

There is no free lunch. Higher CPU time competes with shorter network
transmission time. For serving ES clients over the REST HTTP API, they
expect JSON in 99% of all cases, so deserialization will be invoked in
the REST layer. If you are on the Java transport level, not on REST
HTTP, you should just use the XContent API to retrieve the stored
fields. SMILE is autodetected by a magic cookie, and if you have a
target to deserialize the document to, you just "sit" on Jackson SMILE
API with the XContent API. Otherwise, you would have to decode SMILE by
yourself. If you use both JSON and SMILE in your apps, you may have to
switch to and from, which might be a source for confusion, but that is a
design question outside ES.

  • Are others are using Smile successfully to more efficiently store
    their documents?

No, but the only reason is, I have postponed my SMILE tests because they
are simply of lesser priority than other things (data modeling). I got
30 GB index size on disk with 30m docs with 0.90, no reason for action :slight_smile:

Jörg

Am 24.04.13 13:43, schrieb btiernay:

Well, not much of a response, but 30 views so there must be some
interest besides me :slight_smile:

I'd be very thankful if an ES dev could comment. I tried to ask this
question on the IRC channel with no response.

Thanks again!

Bob

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.