Dealing with units


(Eric Jain) #1

I'm curious how others here index fields with different units, e.g. 5
miles and 10 km, and support queries (and facets) that then use either
miles or km.

Given a document with a property like:

distance : {
value : 5,
unit : 'miles'
}

I want to index:

distance : 8046.72, // normalized to m

But I'd still like _source to contain the original value and unit, so
the original document can be returned.

One way to handle this right now is to index:

distance : { // don't index this
value : 5,
unit : 'miles'
},
_distance : 8046.72 // index this

...and rewrite queries to use _distance (after converting the values
in the query as required). For facets, value_script can be used.

But perhaps there is a better way (e.g. create a custom field type)?


(Karussell) #2

You could use multi field type and use one of them for normalization (e.g.
to meter) or index both units: distance.km and distance.miles

http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

Peter.

On Tuesday, March 6, 2012 8:31:42 AM UTC+1, Eric Jain wrote:

I'm curious how others here index fields with different units, e.g. 5
miles and 10 km, and support queries (and facets) that then use either
miles or km.

Given a document with a property like:

distance : {
value : 5,
unit : 'miles'
}

I want to index:

distance : 8046.72, // normalized to m

But I'd still like _source to contain the original value and unit, so
the original document can be returned.

One way to handle this right now is to index:

distance : { // don't index this
value : 5,
unit : 'miles'
},
_distance : 8046.72 // index this

...and rewrite queries to use _distance (after converting the values
in the query as required). For facets, value_script can be used.

But perhaps there is a better way (e.g. create a custom field type)?


(Eric Jain) #3

On Tue, Mar 6, 2012 at 00:13, Karussell tableyourtime@googlemail.com wrote:

You could use multi field type and use one of them for normalization (e.g.
to meter) or index both units: distance.km and distance.miles

http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

That looks like a promising approach, though _source still wouldn't
contain the original document, but something like:

distance : {
distance : { // not indexed
value : 5,
unit : 'miles'
},
si : 8046.72 // indexed
}

Right? I'd rather not create fields for every supported unit--in some
cases there can be quite a few...


(Shay Banon) #4

Its better to create index a normalized value with the same unit across all docs. You can still index 5 and miles, but add another field that has a normalized value (for example, in miles), and use that when searching.

On Tuesday, March 6, 2012 at 11:13 AM, Eric Jain wrote:

On Tue, Mar 6, 2012 at 00:13, Karussell <tableyourtime@googlemail.com (mailto:tableyourtime@googlemail.com)> wrote:

You could use multi field type and use one of them for normalization (e.g.
to meter) or index both units: distance.km and distance.miles

http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

That looks like a promising approach, though _source still wouldn't
contain the original document, but something like:

distance : {
distance : { // not indexed
value : 5,
unit : 'miles'
},
si : 8046.72 // indexed
}

Right? I'd rather not create fields for every supported unit--in some
cases there can be quite a few...


(Eric Jain) #5

On Tue, Mar 6, 2012 at 11:38, Shay Banon kimchy@gmail.com wrote:

Its better to create index a normalized value with the same unit across all
docs. You can still index 5 and miles, but add another field that has a
normalized value (for example, in miles), and use that when searching.

Creating separate, normalized fields is what I'm doing now. But I was
wondering how difficult it would be to create a custom "dimension"
type that normalizes units, so documents and queries don't need to be
pre/post processed explicitly?


(Shay Banon) #6

You can create a customized type, which expects to accept a json object with the value and unit, and automatically index a normalized field. Hard to answer how difficult it is :slight_smile:

On Tuesday, March 6, 2012 at 10:18 PM, Eric Jain wrote:

On Tue, Mar 6, 2012 at 11:38, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

Its better to create index a normalized value with the same unit across all
docs. You can still index 5 and miles, but add another field that has a
normalized value (for example, in miles), and use that when searching.

Creating separate, normalized fields is what I'm doing now. But I was
wondering how difficult it would be to create a custom "dimension"
type that normalizes units, so documents and queries don't need to be
pre/post processed explicitly?


(system) #7