Hi,
In watching the "What's new in Elasticsearch 0.90?" webinar, one change is
that String data is now encoded in UTF-8, which for many languages results
in a space saving, since ASCII encodes to a single byte, rather than two in
a UCS-2 encoding.
However, for some character sets, notably CJK, UTF-8 encodes to 3 or
sometimes 4 bytes. So, for a site indexing primarily CJK (and some other
Asian / Hangul), the String storage will INCREASE by a good amount (50% or
so).
Is there a way (or will there be a way) to specify the character encoding
to use for String type fields? I looked through the Elasticsearch guide,
and couldn't see anything on install / setup / mapping / index create...
Thanks!
Bob.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.