Comparison of int/long and float/double types

I get the impression that using the 'long' type instead of 'integer' would
use more disk space and degrade search performance (similary for double
instead of float), but there's nothing in the documentation to back this
impression up.

There must be an advantage to using integer (if you can) because otherwise
it wouldn't exist. It just doesn't say what the advantage is.

Can someone confirm? Even better does anyone have any stats on what
difference it would make?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lucene only knows how to index text strings. For numeric types, they are
stored as tries. Tries work on variable length. So only the API is
different to convert integer or long to tries. Tries are the basis for
numeric range searches.

It is a myth that long take more disk space than ints in an inverted index
like Lucene. Both long and integer (numeric types) take a bit more space
than text strings, but for large indices, this does not add up at all, it
is in the noise.

For field caches/filters, and doc values, the difference of integer and
long is more important. But there are other aspects like field cardinality
which determine the overall storage volume required.

Jörg

On Fri, Oct 24, 2014 at 3:22 PM, Tim S timstibbs@gmail.com wrote:

I get the impression that using the 'long' type instead of 'integer' would
use more disk space and degrade search performance (similary for double
instead of float), but there's nothing in the documentation to back this
impression up.

There must be an advantage to using integer (if you can) because otherwise
it wouldn't exist. It just doesn't say what the advantage is.

Can someone confirm? Even better does anyone have any stats on what
difference it would make?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEKAVEToUTJBea4_YCPr3u-_5OiFPrQvqB0rUroV82ynQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

So let's assume the cardinality is the same. Let's assume I have no text, I
only index numeric fields.

If I've got a range of data, that would all fit within the bounds of an
integer, is there any reason not to index it as a long? Are there any down
sides? It sounds like you're saying that there isn't?

On Friday, October 24, 2014 2:55:24 PM UTC+1, Jörg Prante wrote:

Lucene only knows how to index text strings. For numeric types, they are
stored as tries. Tries work on variable length. So only the API is
different to convert integer or long to tries. Tries are the basis for
numeric range searches.

It is a myth that long take more disk space than ints in an inverted index
like Lucene. Both long and integer (numeric types) take a bit more space
than text strings, but for large indices, this does not add up at all, it
is in the noise.

For field caches/filters, and doc values, the difference of integer and
long is more important. But there are other aspects like field cardinality
which determine the overall storage volume required.

Jörg

On Fri, Oct 24, 2014 at 3:22 PM, Tim S <tims...@gmail.com <javascript:>>
wrote:

I get the impression that using the 'long' type instead of 'integer'
would use more disk space and degrade search performance (similary for
double instead of float), but there's nothing in the documentation to back
this impression up.

There must be an advantage to using integer (if you can) because
otherwise it wouldn't exist. It just doesn't say what the advantage is.

Can someone confirm? Even better does anyone have any stats on what
difference it would make?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eff095af-0544-4feb-b09f-fa5c0158a4f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

It depends what you do with ints. Your question was about disk storage.

Ints are much faster when they are loaded into cache: they save 50% memory,
they can be used as index in array for sorting, loading/storing by CPU
instruction takes only one cycle etc.

Jörg

On Fri, Oct 24, 2014 at 4:48 PM, Tim S timstibbs@gmail.com wrote:

So let's assume the cardinality is the same. Let's assume I have no text,
I only index numeric fields.

If I've got a range of data, that would all fit within the bounds of an
integer, is there any reason not to index it as a long? Are there any down
sides? It sounds like you're saying that there isn't?

On Friday, October 24, 2014 2:55:24 PM UTC+1, Jörg Prante wrote:

Lucene only knows how to index text strings. For numeric types, they are
stored as tries. Tries work on variable length. So only the API is
different to convert integer or long to tries. Tries are the basis for
numeric range searches.

It is a myth that long take more disk space than ints in an inverted
index like Lucene. Both long and integer (numeric types) take a bit more
space than text strings, but for large indices, this does not add up at
all, it is in the noise.

For field caches/filters, and doc values, the difference of integer and
long is more important. But there are other aspects like field cardinality
which determine the overall storage volume required.

Jörg

On Fri, Oct 24, 2014 at 3:22 PM, Tim S tims...@gmail.com wrote:

I get the impression that using the 'long' type instead of 'integer'
would use more disk space and degrade search performance (similary for
double instead of float), but there's nothing in the documentation to back
this impression up.

There must be an advantage to using integer (if you can) because
otherwise it wouldn't exist. It just doesn't say what the advantage is.

Can someone confirm? Even better does anyone have any stats on what
difference it would make?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/eff095af-0544-4feb-b09f-fa5c0158a4f5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/eff095af-0544-4feb-b09f-fa5c0158a4f5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHbG_FCTNJ2hyuReLejw4YoJYQV0NOB8%2BkQR-qWgR2Vhw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.