Strange behaviour in field cache use?

Hi, everyone.

I ran some tests, today, and found something i find rather odd.

I have in an index approximatively 26 000 docs, running in (a quite old,
i'm aware) version 0.19.7

In those docs, among other things, i have two fields :
sortString : it is a string, not analyzed, and containing 9-10 digits (ex :
"999999999")
sortDouble : it is a double (ex : 999 999 999 .0005)

I understood that ES, to be able to sort, will put all thoses value into
the field cache.
So, if the size of such a string in memory is about 56 bytes and if the
size of a double is 8 bytes, I should use a lot less cache when sorting on
the latter.

The thing is that both sort eat a similar amount of cache : 9.3 mb for the
strings , and 9.1 mb for the doubles.
Is that normal? Is there something I did not understand on field cache use?
Any insight on that matter would be very helpful.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Elasticsearch uses Lucene and Lucene uses an inverted index, which is
different from what you are used to in RDBMs. The cache consists of Java
objects holding object references, and object references are almost of
equal size for all types of values in the index. You will see no big
differences if you just take care of the field data type.

For range search, Lucene uses tree-like structures for integers. Dates
are stored as longs. There are also advanced techniques like compressed
bitmaps which effect field caching to save memory.

Jörg

Am 29.03.13 14:23, schrieb DH:

The thing is that both sort eat a similar amount of cache : 9.3 mb for
the strings , and 9.1 mb for the doubles.
Is that normal? Is there something I did not understand on field cache
use?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Wow, that was .. quick !
Thank a lot, Jörg, now, it makes sense.

However, I seem to be getting slightly better response time when sorting on
double .. so, I assume it is better, ressource-wise, to sort on doubles,
rather than on strings (its easier to compare two doubles than to compare
two strings, especially when those strings often differs by their few last
characters).
Am I right?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya

In those docs, among other things, i have two fields :
sortString : it is a string, not analyzed, and containing 9-10 digits
(ex : "999999999")
sortDouble : it is a double (ex : 999 999 999 .0005)

I understood that ES, to be able to sort, will put all thoses value
into the field cache.
So, if the size of such a string in memory is about 56 bytes and if
the size of a double is 8 bytes, I should use a lot less cache when
sorting on the latter.

The thing is that both sort eat a similar amount of cache : 9.3 mb for
the strings , and 9.1 mb for the doubles.

It depends how many unique values you have. ES builds an array of
unique values, then an array for the docs, with a pointer pointing at
the unique value.

So if every doc has a unique value, then you will see a big difference
in size. If you only have a few unique values, then size will be almost
identical

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank a lot, Clint
I think I'm getting the hang of this.

So, I have approximatively four times more uniques values for the doubles
than I have for the strings .. however, as doubles are so small compared to
strings, I get approximatively the same cache use from ES with the two.

That's interresting, thanks again.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.