Geo_Bounding_Box Returning Bogus Results


(Shawn Evans) #1

I'm quite new to ElasticSearch, and I'm having some problems getting
accurate results out of the geo_bounding_box filter. It's my understanding
that given 2 points, the geo_bounding_box will return all results that fall
within the designated area defined by the points. I'm using Python with
the json library to get data into a proper _bulk index format, and a
'_bulk' request to insert the data into ES. Here's the gist of it:

  1. Create the index:

$ curl -XPUT 'http://localhost:9200/geodata/'
{"ok":true,"acknowledged":true}

  1. Create a mapping:

$ curl -XPUT 'http://localhost:9200/geodata/record/_mapping' -d '{
"record": { "properties" : { "city" : {"type" : "string"}, "region_code" :
{"type":"string"},g"}, "country_code" : {"type":"string"},
"country_name":{"type":"string"}, "email":{"type":"string"}, "Location" : {
"type" : "geo_point" }, "area_code" : {"type":"string"} } } }'
{"ok":true,"acknowledged":true}

  1. Insert data in the following format (roughly 350k records):

{"index": {"_type": "record", "_id": 80, "_index": "geodata"}}
{"city": "Tokyo", "region_code": "40", "postal_code": null, "Location":
[139.7514, 35.685], "country_code": "JPN", "country_name": "Japan",
"email": "testing1@example.com", "area_code": 0}

After indexing the data, I verified that everything worked properly:

$ curl 'http://localhost:9200/geodata/record/237718/'
{"_index":"geodata","_type":"record","_id":"237718","_version":1,"exists":true,
"_source" : {"city": "Reading", "region_code": "PA", "postal_code":
"19607", "Location": [-75.9523, 40.2851], "country_code": "USA",
"country_name": "United States", "email": "testing2@example.com",
"area_code": 610}}

  1. With the data indexed I attempted to run a query with a
    geo_bounding_box filter.

$ curl
'http://localhost:9200/geodata/record/_search?pretty=true&size=1000000' -d
'{ "query" : { "filtered" : { "query" : { "match_all":{}},"filter": {
"geo_bounding_box": {"Lcation" : { "top_left" :[-75.961189,40.371659]
,"bottom_right ": [-75.892525,40.297858] } } } } } } } }'

  1. The request above returns results, but the results that are returned
    are not even close to the small bounding area I have defined in the query.
    The query above should return results for a small town in Pennsylvania,
    however the results included hits from Portugal and Spain, which are well
    outside of my bounding box. I've tried changing the Location (geo_point
    value) to every format defined in the geo_point reference guide
    (http://www.elasticsearch.org/guide/reference/mapping/geo-point-type/) to
    no avail. I've tried using a nested Location in the format of
    "Pin.location". I've tried forgoing a mapping and going with the 'dynamic'
    type, but everything seems to return the same incorrect results. Any
    assistance or advice would be hugely appreciated. Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Shawn Evans) #2

So here' s a follow up:

it turns out that my query was returning bad results because I had a typo
in my filter:

"bottom_right " vs "bottom_right"

They key difference being the additional space in the first one. So with
that issue identified, I'm now not getting ANY results back:

$ curl
'http://localhost:9200/geodata/record/_search?pretty=true&size=1000000' -d
'{ "query" : { "filtered" : { "query" : { "match_all":{}},"filter": {
"geo_bounding_box": {"Lcation" : { "top_left" : [-75.961189, 40.371659]
,"bottom_right" : [-75.892525, 40.297858] } } } } } } } }'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}

Still zero clues as to why. I'm positive I'm doing something wrong, but
lack the experience to know what exactly. Thanks!

On Thursday, September 19, 2013 9:26:37 AM UTC-4, Shawn Evans wrote:

I'm quite new to ElasticSearch, and I'm having some problems getting
accurate results out of the geo_bounding_box filter. It's my understanding
that given 2 points, the geo_bounding_box will return all results that fall
within the designated area defined by the points. I'm using Python with
the json library to get data into a proper _bulk index format, and a
'_bulk' request to insert the data into ES. Here's the gist of it:

  1. Create the index:

$ curl -XPUT 'http://localhost:9200/geodata/'
{"ok":true,"acknowledged":true}

  1. Create a mapping:

$ curl -XPUT 'http://localhost:9200/geodata/record/_mapping' -d '{
"record": { "properties" : { "city" : {"type" : "string"}, "region_code" :
{"type":"string"},g"}, "country_code" : {"type":"string"},
"country_name":{"type":"string"}, "email":{"type":"string"}, "Location" : {
"type" : "geo_point" }, "area_code" : {"type":"string"} } } }'
{"ok":true,"acknowledged":true}

  1. Insert data in the following format (roughly 350k records):

{"index": {"_type": "record", "_id": 80, "_index": "geodata"}}
{"city": "Tokyo", "region_code": "40", "postal_code": null, "Location":
[139.7514, 35.685], "country_code": "JPN", "country_name": "Japan",
"email": "testing1@example.com", "area_code": 0}

After indexing the data, I verified that everything worked properly:

$ curl 'http://localhost:9200/geodata/record/237718/'
{"_index":"geodata","_type":"record","_id":"237718","_version":1,"exists":true,
"_source" : {"city": "Reading", "region_code": "PA", "postal_code":
"19607", "Location": [-75.9523, 40.2851], "country_code": "USA",
"country_name": "United States", "email": "testing2@example.com",
"area_code": 610}}

  1. With the data indexed I attempted to run a query with a
    geo_bounding_box filter.

$ curl '
http://localhost:9200/geodata/record/_search?pretty=true&size=1000000' -d
'{ "query" : { "filtered" : { "query" : { "match_all":{}},"filter": {
"geo_bounding_box": {"Lcation" : { "top_left" :[-75.961189,40.371659]
,"bottom_right ": [-75.892525,40.297858] } } } } } } } }'

  1. The request above returns results, but the results that are returned
    are not even close to the small bounding area I have defined in the query.
    The query above should return results for a small town in Pennsylvania,
    however the results included hits from Portugal and Spain, which are well
    outside of my bounding box. I've tried changing the Location (geo_point
    value) to every format defined in the geo_point reference guide (
    http://www.elasticsearch.org/guide/reference/mapping/geo-point-type/) to
    no avail. I've tried using a nested Location in the format of
    "Pin.location". I've tried forgoing a mapping and going with the 'dynamic'
    type, but everything seems to return the same incorrect results. Any
    assistance or advice would be hugely appreciated. Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Shawn Evans) #3

So, I realize that my "Location" field is also a typo, but that's an
artifact of strange copy/paste behavior due to unprintable characters on
the command line doing funky things to subsequent input (and apparently
copy/paste). Anyways, the actual query:

$ curl
'http://localhost:9200/geodude/record/_search?pretty=true&size=1000000' -d
'{ "query" : { "filtered" : { "query" : { "match_all":{}},"filter": {
"geo_bounding_box": {"Location" : { "top_left" : [-75.961189, 40.371659]
,"bottom_right" : [-75.892525, 40.297858] } } } } } } } }'

{
"took" : 22,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}

Sorry about the multi post.

On Thursday, September 19, 2013 10:01:07 AM UTC-4, Shawn Evans wrote:

So here' s a follow up:

it turns out that my query was returning bad results because I had a typo
in my filter:

"bottom_right " vs "bottom_right"

They key difference being the additional space in the first one. So with
that issue identified, I'm now not getting ANY results back:

$ curl '
http://localhost:9200/geodata/record/_search?pretty=true&size=1000000' -d
'{ "query" : { "filtered" : { "query" : { "match_all":{}},"filter": {
"geo_bounding_box": {"Lcation" : { "top_left" : [-75.961189, 40.371659]
,"bottom_right" : [-75.892525, 40.297858] } } } } } } } }'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}

Still zero clues as to why. I'm positive I'm doing something wrong, but
lack the experience to know what exactly. Thanks!

On Thursday, September 19, 2013 9:26:37 AM UTC-4, Shawn Evans wrote:

I'm quite new to ElasticSearch, and I'm having some problems getting
accurate results out of the geo_bounding_box filter. It's my understanding
that given 2 points, the geo_bounding_box will return all results that fall
within the designated area defined by the points. I'm using Python with
the json library to get data into a proper _bulk index format, and a
'_bulk' request to insert the data into ES. Here's the gist of it:

  1. Create the index:

$ curl -XPUT 'http://localhost:9200/geodata/'
{"ok":true,"acknowledged":true}

  1. Create a mapping:

$ curl -XPUT 'http://localhost:9200/geodata/record/_mapping' -d '{
"record": { "properties" : { "city" : {"type" : "string"}, "region_code" :
{"type":"string"},g"}, "country_code" : {"type":"string"},
"country_name":{"type":"string"}, "email":{"type":"string"}, "Location" : {
"type" : "geo_point" }, "area_code" : {"type":"string"} } } }'
{"ok":true,"acknowledged":true}

  1. Insert data in the following format (roughly 350k records):

{"index": {"_type": "record", "_id": 80, "_index": "geodata"}}
{"city": "Tokyo", "region_code": "40", "postal_code": null, "Location":
[139.7514, 35.685], "country_code": "JPN", "country_name": "Japan",
"email": "testing1@example.com", "area_code": 0}

After indexing the data, I verified that everything worked properly:

$ curl 'http://localhost:9200/geodata/record/237718/'
{"_index":"geodata","_type":"record","_id":"237718","_version":1,"exists":true,
"_source" : {"city": "Reading", "region_code": "PA", "postal_code":
"19607", "Location": [-75.9523, 40.2851], "country_code": "USA",
"country_name": "United States", "email": "testing2@example.com",
"area_code": 610}}

  1. With the data indexed I attempted to run a query with a
    geo_bounding_box filter.

$ curl '
http://localhost:9200/geodata/record/_search?pretty=true&size=1000000'
-d '{ "query" : { "filtered" : { "query" : { "match_all":{}},"filter": {
"geo_bounding_box": {"Lcation" : { "top_left" :[-75.961189,40.371659]
,"bottom_right ": [-75.892525,40.297858] } } } } } } } }'

  1. The request above returns results, but the results that are returned
    are not even close to the small bounding area I have defined in the query.
    The query above should return results for a small town in Pennsylvania,
    however the results included hits from Portugal and Spain, which are well
    outside of my bounding box. I've tried changing the Location (geo_point
    value) to every format defined in the geo_point reference guide (
    http://www.elasticsearch.org/guide/reference/mapping/geo-point-type/) to
    no avail. I've tried using a nested Location in the format of
    "Pin.location". I've tried forgoing a mapping and going with the 'dynamic'
    type, but everything seems to return the same incorrect results. Any
    assistance or advice would be hugely appreciated. Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4