One of the most common issues for Elastic stack users that deal with geospatial data is how to upload geospatial data users is how to ingest data in Elasticsearch. You can check Kibana 7.10 docs to learn about different ways to achieve this. Some time ago we wrote a blog post that introduces ogr2ogr
, a tool from the GDAL library that helps on ingesting data from dozens of formats into Elasticsearch.
In this Advent Calendar post, we develop an example of this workflow using Docker to leverage the last version of the GDAL tool and OpenStreetMap as a popular source of Open Data Points of Interests.
Using Docker, we avoid issues with versions and local environments, and we can pull the latest versions of GDAL independently of our host Operating System. Alternatively, you can install GDAL library on your system, but ensure you are at least on version 3.1.
Set up: create an Elastic deployment
Go to https://cloud.elastic.co and create a new deployment. For this test, we created a 7.9.3 cluster with one 4GB RAM Elasticsearch node and a 2GB RAM Kibana node. Put the credentials in a .env
file like this:
ELASTIC_HOST=<your-cluster-host>
ELASTIC_PORT=9243
ELASTIC_USER=<your-user>
ELASTIC_PASSWORD=<your-password>
ELASTIC_URL="https://${ELASTIC_USER}:${ELASTIC_PASSWORD}@${ELASTIC_HOST}:${ELASTIC_PORT}"
Source the variables and check the cluster and credentials are all OK using the ogrinfo
tool:
$ source .env
$ docker run --rm \
osgeo/gdal:alpine-small-latest \
ogrinfo -summary "ES:${ELASTIC_URL}"
INFO: Open of `ES:https://<your-user> :<your-host>:<your-port>'
using driver `Elasticsearch' successful.
1: .kibana-event-log-7.9.3-000001 (None)
2: .apm-custom-link (None)
3: .kibana_task_manager_1 (None)
4: .apm-agent-configuration (None)
5: .kibana_1 (None)
All good, we could connect and check that our cluster does not have any geospatial indexes at this moment.
Importing points of interest from OpenStreetMap
Get the data
We are going to use the small Andorra country for an initial step, and then our main area of interest will be New York state. Both datasets will be downloaded in pbf
format from https://download.geofabrik.de.
$ wget https://download.geofabrik.de/europe/andorra-latest.osm.pbf
$ wget https://download.geofabrik.de/north-america/us/new-york-latest.osm.pbf
Set up custom OSM settings
The default OGR OSM driver is too generic and we want to focus on properties related to POI features. To do this we need a custom osmconf.ini
file that will specify the tags that we want to pull from the OSM points.
[points]
osm_id=yes
osm_version=yes
osm_timestamp=yes
osm_uid=no
osm_user=no
osm_changeset=no
attributes=name,amenity,shop,leisure,office,wheelchair,phone,website,twitter,facebook
unsignificant=created_by,converted_by,source,time,ele,attribution
ignore=created_by,converted_by,source,time,ele,note,todo,openGeoDB:,fixme,FIXME
Create the mapping
The next step is to create a mapping file to customize the index fields that we will upload to our cluster. For this step, we will use the andorra-latest.osm.pbf
dataset since it's much smaller and will execute quickly. Let's see the command to run and discuss the several parameters and options:
$ docker run \
--rm -u $(id -u ${USER}):$(id -g ${USER}) `#1`\
-v "${PWD}:/tmp/ogr" `#2`\
-e OSM_CONFIG_FILE="/tmp/ogr/osmconf.ini" `#3`\
osgeo/gdal:alpine-small-latest \
ogr2ogr -progress \
-nln osm -overwrite `#4`\
-sql "select * from points where amenity is not null or shop is not null or leisure is not null or office is not null" `#5`\
-lco WRITE_MAPPING="/tmp/ogr/osm_mapping.json" `#6`\
-lco GEOM_MAPPING_TYPE="GEO_SHAPE" `#7`\
-lco GEOMETRY_NAME=location `#8`\
ES:${ELASTIC_URL} `#9`\
/tmp/ogr/andorra-latest.osm.pbf #10
- Executing an ephemeral Docker container with our own user and group.
- Mount our current folder in
/tmp/ogr
in the container. - Pass the environment variable to configure the OSM driver.
- Define the new layer name as
osm
and overwrite in subsequent executions. - Filter only the points with the tags we are interested in, you can adapt this query to your own needs and filter by any other attributes available.
- Write the auto-generated mapping file in our mounted folder
- Set up the geometry data field as a
geo_shape
- Name the geometry field
- Our cluster destination
- Our testing data source
This will generate an empty osm
index in our cluster (no need to worry about it now) and more importantly a mapping file called osm_mapping.json
. The next step would be to open that file and adapt it to our needs. In this example, we are converting the OSM tags into keyword
fields so we can run aggregations on them, and we are fixing the date format for the osm_timestamp
field.
{
"properties": {
"osm_id": { "type": "text" },
"osm_version": { "type": "integer" },
"osm_timestamp": {
"type": "date",
"format": "yyyy\/MM\/dd HH:mm:ss.SSS"
},
"name": { "type": "text" },
"amenity": { "type": "keyword" },
"shop": { "type": "keyword" },
"leisure": { "type": "keyword" },
"office": { "type": "keyword" },
"wheelchair": { "type": "keyword" },
"phone": { "type": "text" },
"website": { "type": "text" },
"twitter": { "type": "text" },
"facebook": { "type": "text" },
"other_tags": { "type": "text" },
"location": { "type": "geo_shape" }
},
"_meta": {
"fid": "ogc_fid",
"geomfields": { "location": "POINT" }
}
}
Write the dataset
Now we are ready to write our New York dataset. The command is very similar to our previous execution, just change the dataset and use MAPPING
instead of the WRITE_MAPPING
layer creation option.
$ docker run \
--rm -u $(id -u ${USER}):$(id -g ${USER})\
-v "${PWD}:/tmp/ogr" \
-e OSM_CONFIG_FILE="/tmp/ogr/osmconf.ini"\
osgeo/gdal:alpine-small-latest \
ogr2ogr -progress \
-nln osm -overwrite \
-sql "select * from points where amenity is not null or shop is not null or leisure is not null or office is not null" \
-lco MAPPING="/tmp/ogr/osm_mapping.json" \
-lco GEOM_MAPPING_TYPE="GEO_SHAPE" \
-lco GEOMETRY_NAME=location \
ES:${ELASTIC_URL} \
/tmp/ogr/new-york-latest.osm.pbf
After a few moments, your index will be created. Now we can use ogrinfo
again to check osm
index details:
$ docker run --rm \
osgeo/gdal:alpine-small-latest \
ogrinfo -noextent -summary "ES:${ELASTIC_URL}" osm
INFO: Open of `ES:https://<your-user> :<your-host>:<your-port>'
using driver `Elasticsearch' successful.
Layer name: osm
Geometry: Point
Feature Count: 84350
Layer SRS WKT:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
FID Column = ogc_fid
Geometry Column = location
_id: String (0.0)
amenity: String (0.0)
facebook: String (0.0)
leisure: String (0.0)
name: String (0.0)
office: String (0.0)
osm_id: String (0.0)
osm_timestamp: DateTime (0.0)
osm_version: Integer (0.0)
other_tags: String (0.0)
phone: String (0.0)
shop: String (0.0)
twitter: String (0.0)
website: String (0.0)
wheelchair: String (0.0)
Note: be sure to use the -noextent
parameter or the command will perform a download of the dataset.
Visualize in Kibana
With our data in the index, we are ready to visualize the data with Elastic Maps, but the first thing as always is to create a new index pattern using the osm_timestamp
as the date field.
We can go to the Discover application and explore how data has been added to the OSM over the last years.
And directly from this application, you can click in the location
field and automatically create a map with the index as a new layer.
From here you can visualize your points based on different properties, search, aggregate them using grids or heat-maps, show them along with your own business data, etc. Check for more details in the Kibana Elastic Maps documentation.
Happy mapping!!