Comparison of int/long and float/double types

Tim_S · October 24, 2014, 1:22pm

I get the impression that using the 'long' type instead of 'integer' would
use more disk space and degrade search performance (similary for double
instead of float), but there's nothing in the documentation to back this
impression up.

There must be an advantage to using integer (if you can) because otherwise
it wouldn't exist. It just doesn't say what the advantage is.

Can someone confirm? Even better does anyone have any stats on what
difference it would make?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · October 24, 2014, 1:55pm

Lucene only knows how to index text strings. For numeric types, they are
stored as tries. Tries work on variable length. So only the API is
different to convert integer or long to tries. Tries are the basis for
numeric range searches.

It is a myth that long take more disk space than ints in an inverted index
like Lucene. Both long and integer (numeric types) take a bit more space
than text strings, but for large indices, this does not add up at all, it
is in the noise.

For field caches/filters, and doc values, the difference of integer and
long is more important. But there are other aspects like field cardinality
which determine the overall storage volume required.

Jörg

On Fri, Oct 24, 2014 at 3:22 PM, Tim S timstibbs@gmail.com wrote:

I get the impression that using the 'long' type instead of 'integer' would
use more disk space and degrade search performance (similary for double
instead of float), but there's nothing in the documentation to back this
impression up.

There must be an advantage to using integer (if you can) because otherwise
it wouldn't exist. It just doesn't say what the advantage is.

Can someone confirm? Even better does anyone have any stats on what
difference it would make?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEKAVEToUTJBea4_YCPr3u-_5OiFPrQvqB0rUroV82ynQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Tim_S · October 24, 2014, 2:48pm

So let's assume the cardinality is the same. Let's assume I have no text, I
only index numeric fields.

If I've got a range of data, that would all fit within the bounds of an
integer, is there any reason not to index it as a long? Are there any down
sides? It sounds like you're saying that there isn't?

On Friday, October 24, 2014 2:55:24 PM UTC+1, Jörg Prante wrote:

Lucene only knows how to index text strings. For numeric types, they are
stored as tries. Tries work on variable length. So only the API is
different to convert integer or long to tries. Tries are the basis for
numeric range searches.

It is a myth that long take more disk space than ints in an inverted index
like Lucene. Both long and integer (numeric types) take a bit more space
than text strings, but for large indices, this does not add up at all, it
is in the noise.

For field caches/filters, and doc values, the difference of integer and
long is more important. But there are other aspects like field cardinality
which determine the overall storage volume required.

Jörg

On Fri, Oct 24, 2014 at 3:22 PM, Tim S <tims...@gmail.com <javascript:>>
wrote:

I get the impression that using the 'long' type instead of 'integer'
would use more disk space and degrade search performance (similary for
double instead of float), but there's nothing in the documentation to back
this impression up.

There must be an advantage to using integer (if you can) because
otherwise it wouldn't exist. It just doesn't say what the advantage is.

Can someone confirm? Even better does anyone have any stats on what
difference it would make?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eff095af-0544-4feb-b09f-fa5c0158a4f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · October 24, 2014, 3:04pm

It depends what you do with ints. Your question was about disk storage.

Ints are much faster when they are loaded into cache: they save 50% memory,
they can be used as index in array for sorting, loading/storing by CPU
instruction takes only one cycle etc.

Jörg

On Fri, Oct 24, 2014 at 4:48 PM, Tim S timstibbs@gmail.com wrote:

So let's assume the cardinality is the same. Let's assume I have no text,
I only index numeric fields.

If I've got a range of data, that would all fit within the bounds of an
integer, is there any reason not to index it as a long? Are there any down
sides? It sounds like you're saying that there isn't?

On Friday, October 24, 2014 2:55:24 PM UTC+1, Jörg Prante wrote:

Lucene only knows how to index text strings. For numeric types, they are
stored as tries. Tries work on variable length. So only the API is
different to convert integer or long to tries. Tries are the basis for
numeric range searches.

It is a myth that long take more disk space than ints in an inverted
index like Lucene. Both long and integer (numeric types) take a bit more
space than text strings, but for large indices, this does not add up at
all, it is in the noise.

For field caches/filters, and doc values, the difference of integer and
long is more important. But there are other aspects like field cardinality
which determine the overall storage volume required.

Jörg

On Fri, Oct 24, 2014 at 3:22 PM, Tim S tims...@gmail.com wrote:

I get the impression that using the 'long' type instead of 'integer'
would use more disk space and degrade search performance (similary for
double instead of float), but there's nothing in the documentation to back
this impression up.

There must be an advantage to using integer (if you can) because
otherwise it wouldn't exist. It just doesn't say what the advantage is.

Can someone confirm? Even better does anyone have any stats on what
difference it would make?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ff73f7bc-0578-44ea-803f-39fe723b5764%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/eff095af-0544-4feb-b09f-fa5c0158a4f5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/eff095af-0544-4feb-b09f-fa5c0158a4f5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHbG_FCTNJ2hyuReLejw4YoJYQV0NOB8%2BkQR-qWgR2Vhw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Integer Vs Long field comparison Elasticsearch	1	541	May 5, 2019
Integer size vs Long size Elasticsearch	3	3595	July 6, 2017
Long vs Integer Elasticsearch	2	1204	December 12, 2016
Float vs int performance Elasticsearch	4	609	July 6, 2017
Elasticsearch aggregation performance on long vs float/scaled_float/double Elasticsearch	1	569	December 1, 2019

Comparison of int/long and float/double types

Related topics