You are using ngram tokenizer which explodes index size. If you use ES
default sharding, you have 5 shards (and therefore, 5 Lucene indexes).
With ngram, you have scattered tokens over all shards, and this
converges to 5x the space compared to 1 shard.
Also, store = yes for each field is kind of clumsy. You have to enable
each field to get them returned for a query (only _source is returned by
default). I don't see much sense in making an ngram analyzed field
stored. Can you elaborate?
Jörg
Am 22.05.13 11:08, schrieb Shlomi:
does ES store its numeric fields as strings?
can someone confirm that if you disable _source and keep each field as
stored and indexed, your fields becomes invisible (although
queriable)? or am i doing something totally wrong?..
Thanks
On Tuesday, May 21, 2013 7:10:07 PM UTC+3, Shlomi wrote:
here is a fraction of the mapping i have (i use clojure so its a
bit different from json, but its essentially the same):
{:test {
:_source {:enabled "false" }
:_all {:enabled "false" }
:properties {:gram {:type "string" :store
"yes" :analyzer :ngram-index :compress "true"}
:freq {:type "long"
:store "yes"} }}}]
On Tuesday, May 21, 2013 7:07:44 PM UTC+3, Shlomi wrote:
Hey,
thanks all, let me reply:
Michael - no, i set replicas to 0 (if that what you meant..)
Itamar & Matt - i disabled _all and _source, and explicitly
set "store" to "yes" for both fields (i dont care about perf
for now..) - with this setting i still got a much larger size
and was still unable to see the fields (although i set store
to yes) through queries (only got id's back)
On Tuesday, May 21, 2013 7:03:19 PM UTC+3, Matt Weber wrote:
Don't forget about the _all field. Also, if you don't
store the source, you need to explicitly set "store" to
yes on your field mappings so you can have them returned
in the results.
On Tue, May 21, 2013 at 8:59 AM, Shlomi
<shlomi...@gmail.com> wrote:
yes, so i was trying to exclude source, but then
queries didnt return anything besides id. but in any
case, even disabling source still gave me a large index..
any way to tell it to save just the fields?
On Tuesday, May 21, 2013 6:54:38 PM UTC+3, Itamar
Syn-Hershko wrote:
Yes, because ES stores the entire source by default
On Tue, May 21, 2013 at 6:53 PM, Shlomi
<shlomi...@gmail.com> wrote:
Hey,
We have some old java code that uses lucene
and grizzly to serve queries over text. we
have two field, a string field and a numeric
(long) field. the indexing code is pretty
straight forward.
I was trying to migrate this to elastic,
pretty simple configuration, and indexed the
same data.
the java based implementation took about 6gb,
while to elastic took 17gb..
does this makes sense? what could i do about
this?
Thanks!
--
You received this message because you are
subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to
elasticsearc...@googlegroups.com.
For more options, visit
https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>.
--
You received this message because you are subscribed
to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving
emails from it, send an email to
elasticsearc...@googlegroups.com.
For more options, visit
https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.