What type to choose to index a field that has only few possible values (more than 'boolean' but less than 'short')

Hello List,

I would like to index photo meta data an one of the field to index is the
aperture. This is the 'f' number which is a number that typically only
takes a predefined, the set of possible values is more or less:
1.4, 2, 2.8, 4, 5.6, .... , 32
(powers of square root of 2, i.e. sqrt(2)^N where N is (approx.) in {0..12})

  1. What would be the best type to use to store this kind of values?
    'float'? or a mapping to 'short' or 'integer'?

If a number type is chosen, what would be the optimal precision step to
use? (assuming one wants to do range queries)

If one has only 2^4=16 possible values for a given field, what is the
effect of using a type that can store 2^16=65536 different values?

Or is that more effective to chosen a mapping to (say) the first 16 letters
of the alphabet and choose a type 'string'?

  1. Also, another related question: what is the effect of using a boolean
    value on the index? In databases, it is often not a good idea to use an
    index on boolean se

Re: Putting an INDEX on a boolean field?
http://archives.postgresql.org/pgsql-sql/2005-06/msg00215.php

In elasticesearch, is that better to use a index on boolean, or to do
filtering on the client app? are there risk to get the index uselessly much
larger?

As you can see, this question is a bit more about indexing in general than
about elasticsearch in particular but I ask here as I want to use
elasticsearch :wink:

Thanks in advance!
TuXRaceR

--

Hi,

On Wednesday, November 14, 2012 3:32:53 AM UTC+13, tuxracer69 wrote:

Hello List,

I would like to index photo meta data an one of the field to index is the
aperture. This is the 'f' number which is a number that typically only
takes a predefined, the set of possible values is more or less:
1.4, 2, 2.8, 4, 5.6, .... , 32
(powers of square root of 2, i.e. sqrt(2)^N where N is (approx.) in
{0..12})

  1. What would be the best type to use to store this kind of values?
    'float'? or a mapping to 'short' or 'integer'?

Given you're using decimal values, sounds like you should use float.

If a number type is chosen, what would be the optimal precision step to
use? (assuming one wants to do range queries)

The ideal precisionStep for floats is 4, which happens to be the default
value chosen by ElasticSearch.

If one has only 2^4=16 possible values for a given field, what is the
effect of using a type that can store 2^16=65536 different values?

Inside a Lucene index (like that used by ElasticSearch) there isn't really
typing, there is just terms much like you'd find at the back of
an encyclopaedia so there won't be any effect.

Or is that more effective to chosen a mapping to (say) the first 16
letters of the alphabet and choose a type 'string'?

  1. Also, another related question: what is the effect of using a boolean
    value on the index? In databases, it is often not a good idea to use an
    index on boolean se

Re: Putting an INDEX on a boolean field?
http://archives.postgresql.org/pgsql-sql/2005-06/msg00215.php

In elasticesearch, is that better to use a index on boolean, or to do
filtering on the client app? are there risk to get the index uselessly much
larger?

It's absolutely fine to use boolean values in ElasticSearch. They are not
treated any differently to other values and you shouldn't experience any
problems.

As you can see, this question is a bit more about indexing in general than
about elasticsearch in particular but I ask here as I want to use
elasticsearch :wink:

Thanks in advance!
TuXRaceR

--

On 11/13/2012 2:44 PM, Chris Male wrote:

Hi,

On Wednesday, November 14, 2012 3:32:53 AM UTC+13, tuxracer69 wrote:

Hello List,

I would like to index photo meta data an one of the field to index
is the aperture. This is the 'f' number which is a number that
typically only takes a predefined, the set of possible values is
more or less:
1.4, 2, 2.8, 4, 5.6, .... , 32
(powers of square root of 2, i.e. sqrt(2)^N where N is (approx.)
in {0..12})

1) What would be the best type to use to store this kind of values?
'float'? or a mapping to 'short' or 'integer'?

Given you're using decimal values, sounds like you should use float.

Actually, given that the values are from a fixed set of possible values,
I see no reason to use a float.

If a number type is chosen, what would be the optimal precision
step to use? (assuming one wants to do range queries)

The ideal precisionStep for floats is 4, which happens to be the
default value chosen by ElasticSearch.

If one has only 2^4=16 possible values for a given field, what is
the effect of using a type that can store 2^16=65536 different values?

Inside a Lucene index (like that used by ElasticSearch) there isn't
really typing, there is just terms much like you'd find at the back of
an encyclopaedia so there won't be any effect.

Or is that more effective to chosen a mapping to (say) the first
16 letters of the alphabet and choose a type 'string'?

I would choose integers, because back in my client code somewhere I know
I would have some a constant where I was storing the name of some
constant meaning the value F/5.6,
e.g. var F_5_6 = 3;
or in some map that take 3 back to enumerated constant.

Then range queries would be in terms of the integer indices representing
the range of all possible F-numbers.

-Paul

--

On 11/13/2012 11:51 PM, P. Hill wrote:

I would like to index photo meta data an one of the field to index
is the aperture. This is the 'f' number which is a number that
typically only takes a predefined, the set of possible values is
more or less:
1.4, 2, 2.8, 4, 5.6, .... , 32
(powers of square root of 2, i.e. sqrt(2)^N where N is (approx.)
in {0..12})

I would choose integers, because back in my client code somewhere I
know I would have some a constant where I was storing the name of some
constant meaning the value F/5.6,
e.g. var F_5_6 = 3;
or in some map that take 3 back to enumerated constant.

Then range queries would be in terms of the integer indices
representing the range of all possible F-numbers.

-Paul

Thank you guys for your input, I will go with the int type and store the
N number in the formula above rather than the F number as the N
distribution will be more uniform,
TuXRaceR

--

On 11/14/2012 2:34 AM, Tux raceR wrote:

Thank you guys for your input, I will go with the int type and store
the N number in the formula above rather than the F number as the N
distribution will be more uniform,
TuXRaceR

If you are using Java on your ES client side you can even have enums
that have methods. For example, a asFloat() method that gives you the
appropriate value of F_5_6 to use in the equation you mentioned. Other
possibilities include a method that would run the calculation for you
using values defined with each enumeration. I've found enums that can
convert themselves to other values to be very useful on occassion.

-Paul

--