Geo_bounding_box Filter for a n00b


(andrew jardine) #1

Hey Guys,

First off, my disclaimer. I'm a total n00b to ES. I'm trying to see if it can solve my current problem, which I am almost positive it can -- but time is of the essence here and after several hours of monkeying around with various queries, things I have found online etc, I think it's time to ask the experts. Here is what I have.

First off, I am using Liferay Portal. My records that represent locations are part of the content management system and as such, adding a new content item results in Liferay adding the "document" to ES. I don't really have any control over how it is structured. Suffice to say that it looks like this --

{
id: 1234
status: 0
city: toronto
state: ontario
country: canada
latitude: 43.761539
longitude: -79.411079
}

the names are a little different in fact, and there are a ton more fields, but that's enough to get the point across. Now, I am trying to run a search, as mentioned, using the geo bounding box filter. I'm using google maps and as such I am able to get the N,S,E,W coordinates. The example that I am looking at (https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-geo-bounding-box-filter.html) suggests a (theoretical) structure like this:

{
"pin" : {
"location" : {
"lat" : 40.12,
"lon" : -71.34
}
}
}

and then to use this --

{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {
"top_left" : {
"lat" : 40.73,
"lon" : -74.1
},
"bottom_right" : {
"lat" : 40.01,
"lon" : -71.12
}
}
}
}
}
}

.. but I am not sure how to plugin MY structure because the lat/long I am using is not bundled in a parent object as it is showing in the example.

Can anyone help?


(David Pilato) #2

You need to put lat/lon in the source document within an object, let say location.
Define first a mapping where location is a geo_point.

Then you will be able to play with geo distance.


(andrew jardine) #3

Hey daddonet,

Thanks fo much for replying -- I think I am getting closer but I am not sure. Let me explain a little more about this solution. By default, OOTB, Liferay comes with Lucene. I have installed a plugin that was shared with the community to use ES instead. Under the hood, Liferay provides a "search API" that provides generic classes so that you can write plugins to use whatever target search engine you like. So when this ES plugin was created, they essentially provided the Spring mappings to override the Liferay Impl classes (that use the Lucene API) to instead provide objects that use the ES API.

I am leveraging the web content management tool (as I think I mentioned) so I needed a way to get a single field with the lat/long in the index. Fortunately, Liferay provides a plugin type called hook that allows me to create a PostIndexProcessor. This hook allows you to "alter" the document before it is sent to the index. So what I have done here is taken the lat and long individual field values and creating an additional field (called "location") with the value lat,lon and then adding the field to the document. The good news is that it worked and showed up! Unfortunately, the error that I am getting now is saying that the field "location" is not of type geo_point.

Digging through the ES Plugin code, I found the template.json file that appears to define the field name and corresponding types. I added this to it --

{
"template": "liferay_*",
"mappings": {
"documents": {

        "properties": {
				 "location": {
                  "type": "geo_point"
            }
            "uid": {
                "type": "string",

...

and then rebuilt my index. But when I run http://localhost:9200/_mapping -- it's showing the field as a String.

"location": {
"type": "string"
},

.. can you tell me if I am declaring the field type in the correct manner?


(andrew jardine) #4

Hi dadoonet,

I think I may have sourced the problem. One of the classes in the plugin is responsible for structuring the document and writing it to the index. You can see this class hereL https://github.com/R-Knowsys/elasticray/blob/master/webs/elasticray-web/docroot/WEB-INF/src/com/rknowsys/portal/search/elastic/ElasticsearchIndexWriter.java

Check out line #291 and down. What I am seeing is that it is trying to detect the field try to write the value accordingly with a default type using String I would guess. The "Field" class in this case is part of the Liferay's service API which doesn't have detection/support for a Object, JSONObject of a GeoPoint type. So even if I wanted to fork this plugin and customize the logic, I think it would require a much bigger change in that I would also have to modify some of Liferay's portal-service module. This is definitely not a road I want to go down unless I have absolutely no other options.

So coming full circle on this, perhaps I don't actually need a geo point? I need the behaviour that filter provides but if I was able to store a latitude and a longitude as a numeric (float) value, then perhaps I could use a must with two range filters? Such that the lat must be > WEST and < EAST (range 1) and the long must be < NORTH and > SOUTH (range 2)? Do you think that would work?

The original issue I faced with lucene was that everything is treated as a string so my range attempts failed because the numeric values converted to strings produced inaccurate results in a compare operation.

Being a n00b to ES, does my alternate approach sound like something that might work?


(andrew jardine) #5

Just to close the loop on this, I have it working now! In the end I went with what my last post asked -- I am using the post index processor to create two additional numeric fields, one for the lat and one for the long which allows me to use two "must" range queries to get records in a "bounding box". No doubt that this is less performant than using the geo_bounding_filter but my data set is unlikely to ever be so big that it can't handle the load. Given the limitations I am faced with, I think it's a reasonable tradeoff -- and if I ever get some breathing room, I'll fork that plugin and build a 2.0 version with more flexibility.

Thanks dadoonet for answering. If nothing else your reply gave me enough hope to use you as s rubber duck to figure this one out -- and I am really starting ti understand and appreciate the power of ES. Definitely going to use it more in the future and dust off those books I bought a while back to better my skills.


(David Pilato) #6

If you don't want to modify what your plugin is doing, you can may be use an ingest pipeline which is coming with elasticsearch 5.0 and move your fields from:

{
  "id": 1234,
  "status": 0,
  "city": "toronto",
  "state": "ontario",
  "country": "canada",
  "latitude": 43.761539,
  "longitude": -79.411079
}

to

{
  "id": 1234,
  "status": 0,
  "city": "toronto",
  "state": "ontario",
  "country": "canada",
  "location": {
    "lat": 43.761539,
    "lon": -79.411079
  }
}

See https://www.elastic.co/guide/en/elasticsearch/reference/5.0/rename-processor.html


(andrew jardine) #7

Hey David,

I'll definitely check that out but one of my limitations is that right now the community plugin only supports ES 1.4. I've added a TO-DO to my list to fork the project though to update it to the latest ES and improve of a few of the features -- like supporting ALL the ES data types.

Thanks again for your help. I'm sure I'll be back :slight_smile:


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.