Closest number query

Hello,
I'm working on interesting problem, and I think it could be solved using ES
features.
I have millions of documents, which have an multi-field of 2 numbers.

Here is an example:
_source{
some_data:{ related to the document.....},
dna:{
entry:{
color:2397879,
area:0.4
},
entry:{
color:938893,
area:0.3
},
entry:{
color:3498438,
area:0.2
},
entry:{
color:894879,
area:0.1
}
}
}

Visually, it is a rectangle with the colors each covering specified area
("1" is the total space, so 2397879 would cover almost a half, etc..)
Now, I need to make a query which has an number as input. I need to find
all documents "color" field of which is closest to that number, and "area"
field is a boost, so if a color covers more space, the document should be
higher in the search result.

First, I couldn't find the right way to search closest numbers. For
example, if one document has field value 4 and another document has value 6
they are equally close to 5, and should be retrieved with the same score. I
could make some kind of threshold and just run a range filter (from
x-threshold, to x+threshold) on the "color" field, but this wouldn't be the
right order.
Second step is boosting using other field's value. Any help here is also
appreciated.

I'm kind of stack here, and need help of an ES guru.
If you have any thoughts, suggestions, please share,
Any help is greatly appreciated.
Thank you,
Eugene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think a combination of gradually expanding range filter that have
suggested with script sort might work. Alone script sort would be very
slow, but if you will combine it with range filter that would
significantly reduce the number of results, it might actually
work: color_match.sh · GitHub

On Saturday, March 23, 2013 7:28:35 PM UTC-4, Eugene Strokin wrote:

Hello,
I'm working on interesting problem, and I think it could be solved using
ES features.
I have millions of documents, which have an multi-field of 2 numbers.

Here is an example:
_source{
some_data:{ related to the document.....},
dna:{
entry:{
color:2397879,
area:0.4
},
entry:{
color:938893,
area:0.3
},
entry:{
color:3498438,
area:0.2
},
entry:{
color:894879,
area:0.1
}
}
}

Visually, it is a rectangle with the colors each covering specified area
("1" is the total space, so 2397879 would cover almost a half, etc..)
Now, I need to make a query which has an number as input. I need to find
all documents "color" field of which is closest to that number, and "area"
field is a boost, so if a color covers more space, the document should be
higher in the search result.

First, I couldn't find the right way to search closest numbers. For
example, if one document has field value 4 and another document has value 6
they are equally close to 5, and should be retrieved with the same score. I
could make some kind of threshold and just run a range filter (from
x-threshold, to x+threshold) on the "color" field, but this wouldn't be the
right order.
Second step is boosting using other field's value. Any help here is also
appreciated.

I'm kind of stack here, and need help of an ES guru.
If you have any thoughts, suggestions, please share,
Any help is greatly appreciated.
Thank you,
Eugene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Your example helped a lot, thank you.
I've done it in a similar way, but using Custom Score Query, and writing
the script based on your example.
Thanks again,
Eugene

On Monday, March 25, 2013 6:53:28 PM UTC-4, Igor Motov wrote:

I think a combination of gradually expanding range filter that have
suggested with script sort might work. Alone script sort would be very
slow, but if you will combine it with range filter that would
significantly reduce the number of results, it might actually work:
color_match.sh · GitHub

On Saturday, March 23, 2013 7:28:35 PM UTC-4, Eugene Strokin wrote:

Hello,
I'm working on interesting problem, and I think it could be solved using
ES features.
I have millions of documents, which have an multi-field of 2 numbers.

Here is an example:
_source{
some_data:{ related to the document.....},
dna:{
entry:{
color:2397879,
area:0.4
},
entry:{
color:938893,
area:0.3
},
entry:{
color:3498438,
area:0.2
},
entry:{
color:894879,
area:0.1
}
}
}

Visually, it is a rectangle with the colors each covering specified area
("1" is the total space, so 2397879 would cover almost a half, etc..)
Now, I need to make a query which has an number as input. I need to find
all documents "color" field of which is closest to that number, and "area"
field is a boost, so if a color covers more space, the document should be
higher in the search result.

First, I couldn't find the right way to search closest numbers. For
example, if one document has field value 4 and another document has value 6
they are equally close to 5, and should be retrieved with the same score. I
could make some kind of threshold and just run a range filter (from
x-threshold, to x+threshold) on the "color" field, but this wouldn't be the
right order.
Second step is boosting using other field's value. Any help here is also
appreciated.

I'm kind of stack here, and need help of an ES guru.
If you have any thoughts, suggestions, please share,
Any help is greatly appreciated.
Thank you,
Eugene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.