Dec 6th, 2020: [EN] Uploading data from OSM into Elasticsearch

Spanish version

One of the most common issues for Elastic stack users that deal with geospatial data is how to upload geospatial data users is how to ingest data in Elasticsearch. You can check Kibana 7.10 docs to learn about different ways to achieve this. Some time ago we wrote a blog post that introduces ogr2ogr, a tool from the GDAL library that helps on ingesting data from dozens of formats into Elasticsearch.

In this Advent Calendar post, we develop an example of this workflow using Docker to leverage the last version of the GDAL tool and OpenStreetMap as a popular source of Open Data Points of Interests.

Using Docker, we avoid issues with versions and local environments, and we can pull the latest versions of GDAL independently of our host Operating System. Alternatively, you can install GDAL library on your system, but ensure you are at least on version 3.1.

Set up: create an Elastic deployment

Go to https://cloud.elastic.co and create a new deployment. For this test, we created a 7.9.3 cluster with one 4GB RAM Elasticsearch node and a 2GB RAM Kibana node. Put the credentials in a .env file like this:

ELASTIC_HOST=<your-cluster-host>
ELASTIC_PORT=9243
ELASTIC_USER=<your-user>
ELASTIC_PASSWORD=<your-password>

ELASTIC_URL="https://${ELASTIC_USER}:${ELASTIC_PASSWORD}@${ELASTIC_HOST}:${ELASTIC_PORT}"

Source the variables and check the cluster and credentials are all OK using the ogrinfo tool:

$ source .env
$ docker run --rm \
 osgeo/gdal:alpine-small-latest \
 ogrinfo -summary "ES:${ELASTIC_URL}"

INFO: Open of `ES:https://<your-user> :<your-host>:<your-port>'
      using driver `Elasticsearch' successful.
1: .kibana-event-log-7.9.3-000001 (None)
2: .apm-custom-link (None)
3: .kibana_task_manager_1 (None)
4: .apm-agent-configuration (None)
5: .kibana_1 (None)

All good, we could connect and check that our cluster does not have any geospatial indexes at this moment.

Importing points of interest from OpenStreetMap

Get the data

We are going to use the small Andorra country for an initial step, and then our main area of interest will be New York state. Both datasets will be downloaded in pbf format from https://download.geofabrik.de.

$ wget https://download.geofabrik.de/europe/andorra-latest.osm.pbf
$ wget https://download.geofabrik.de/north-america/us/new-york-latest.osm.pbf

Set up custom OSM settings

The default OGR OSM driver is too generic and we want to focus on properties related to POI features. To do this we need a custom osmconf.ini file that will specify the tags that we want to pull from the OSM points.

[points]
osm_id=yes
osm_version=yes
osm_timestamp=yes
osm_uid=no
osm_user=no
osm_changeset=no

attributes=name,amenity,shop,leisure,office,wheelchair,phone,website,twitter,facebook

unsignificant=created_by,converted_by,source,time,ele,attribution
ignore=created_by,converted_by,source,time,ele,note,todo,openGeoDB:,fixme,FIXME

Create the mapping

The next step is to create a mapping file to customize the index fields that we will upload to our cluster. For this step, we will use the andorra-latest.osm.pbf dataset since it's much smaller and will execute quickly. Let's see the command to run and discuss the several parameters and options:

$ docker run \
  --rm -u $(id -u ${USER}):$(id -g ${USER}) `#1`\
  -v "${PWD}:/tmp/ogr" `#2`\
  -e OSM_CONFIG_FILE="/tmp/ogr/osmconf.ini"  `#3`\
 osgeo/gdal:alpine-small-latest \
 ogr2ogr -progress \
  -nln osm -overwrite `#4`\
  -sql "select * from points where amenity is not null or shop is not null or leisure is not null or office is not null" `#5`\
  -lco WRITE_MAPPING="/tmp/ogr/osm_mapping.json" `#6`\
  -lco GEOM_MAPPING_TYPE="GEO_SHAPE" `#7`\
  -lco GEOMETRY_NAME=location `#8`\
  ES:${ELASTIC_URL} `#9`\
  /tmp/ogr/andorra-latest.osm.pbf #10
  1. Executing an ephemeral Docker container with our own user and group.
  2. Mount our current folder in /tmp/ogr in the container.
  3. Pass the environment variable to configure the OSM driver.
  4. Define the new layer name as osm and overwrite in subsequent executions.
  5. Filter only the points with the tags we are interested in, you can adapt this query to your own needs and filter by any other attributes available.
  6. Write the auto-generated mapping file in our mounted folder
  7. Set up the geometry data field as a geo_shape
  8. Name the geometry field
  9. Our cluster destination
  10. Our testing data source

This will generate an empty osm index in our cluster (no need to worry about it now) and more importantly a mapping file called osm_mapping.json. The next step would be to open that file and adapt it to our needs. In this example, we are converting the OSM tags into keyword fields so we can run aggregations on them, and we are fixing the date format for the osm_timestamp field.

{
    "properties": {
        "osm_id": { "type": "text" },
        "osm_version": { "type": "integer" },
        "osm_timestamp": {
            "type": "date",
            "format": "yyyy\/MM\/dd HH:mm:ss.SSS"
        },
        "name": { "type": "text" },
        "amenity": { "type": "keyword" },
        "shop": { "type": "keyword" },
        "leisure": { "type": "keyword" },
        "office": { "type": "keyword" },
        "wheelchair": { "type": "keyword" },
        "phone": { "type": "text" },
        "website": { "type": "text" },
        "twitter": { "type": "text" },
        "facebook": { "type": "text" },
        "other_tags": { "type": "text" },
        "location": { "type": "geo_shape" }
    },
    "_meta": {
        "fid": "ogc_fid",
        "geomfields": { "location": "POINT" }
    }
}

Write the dataset

Now we are ready to write our New York dataset. The command is very similar to our previous execution, just change the dataset and use MAPPING instead of the WRITE_MAPPING layer creation option.

$ docker run \
  --rm -u $(id -u ${USER}):$(id -g ${USER})\
  -v "${PWD}:/tmp/ogr" \
  -e OSM_CONFIG_FILE="/tmp/ogr/osmconf.ini"\
 osgeo/gdal:alpine-small-latest \
 ogr2ogr -progress \
  -nln osm -overwrite \
  -sql "select * from points where amenity is not null or shop is not null or leisure is not null or office is not null" \
  -lco MAPPING="/tmp/ogr/osm_mapping.json" \
  -lco GEOM_MAPPING_TYPE="GEO_SHAPE" \
  -lco GEOMETRY_NAME=location \
  ES:${ELASTIC_URL} \
  /tmp/ogr/new-york-latest.osm.pbf

After a few moments, your index will be created. Now we can use ogrinfo again to check osm index details:

$ docker run --rm \
 osgeo/gdal:alpine-small-latest \
 ogrinfo -noextent -summary "ES:${ELASTIC_URL}" osm 

INFO: Open of `ES:https://<your-user> :<your-host>:<your-port>'
      using driver `Elasticsearch' successful.

Layer name: osm
Geometry: Point
Feature Count: 84350
Layer SRS WKT:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
FID Column = ogc_fid
Geometry Column = location
_id: String (0.0)
amenity: String (0.0)
facebook: String (0.0)
leisure: String (0.0)
name: String (0.0)
office: String (0.0)
osm_id: String (0.0)
osm_timestamp: DateTime (0.0)
osm_version: Integer (0.0)
other_tags: String (0.0)
phone: String (0.0)
shop: String (0.0)
twitter: String (0.0)
website: String (0.0)
wheelchair: String (0.0)

Note: be sure to use the -noextent parameter or the command will perform a download of the dataset.

Visualize in Kibana

With our data in the index, we are ready to visualize the data with Elastic Maps, but the first thing as always is to create a new index pattern using the osm_timestamp as the date field.

We can go to the Discover application and explore how data has been added to the OSM over the last years.

And directly from this application, you can click in the location field and automatically create a map with the index as a new layer.

From here you can visualize your points based on different properties, search, aggregate them using grids or heat-maps, show them along with your own business data, etc. Check for more details in the Kibana Elastic Maps documentation.

Happy mapping!!

4 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.