Indexing problems with GDAL/ogr2ogr for GEO_POINT of geojson files

I am having problem indexing Geojson files that have Point geometries using GDAL/ogr2ogr library.

Points are not being indexed in the same way as if I am doing uploading geojson file with Kibana.

With ogr2ogr I am having:

_source.geometry.coordinates

    "hits" : [
      {
        "_index" : "p_acceso",
        "_type" : "_doc",
        "_id" : "7dpIVX8B5BxBfAECiTdt",
        "_score" : 1.0,
        "_source" : {
          "id" : 1,
          "geometry" : {
            "type" : "Point",
            "coordinates" : [
              2.0628608796,
              41.5908310879
            ]
          },
          "id_via" : 414,
          "tipo_via" : "Carrer",
...

If I upload the same file through kibana, the result is different (and it works well doing reverse geocoding):

_source.coordinates

"hits" : [
      {
        "_index" : "idx_cdv_acceso",
        "_type" : "_doc",
        "_id" : "CCczu34BSDMa7ItnC5xF",
        "_score" : 1.0,
        "_source" : {
          "coordinates" : [
            2.06286087960269,
            41.5908310878933
          ],
          "id" : 1,
          "id_via" : 414,
          "tipo_via" : "Carrer",
...

There is no examples in the documentation besides it referes to ogr2ogr as a way to upload and index data ogr2ogr ingesting.

Does any one how to lead that? Is there any way to map a nested geometry field?

Any help would be appreciate it.

Thanks!

Jordi.

The difference is that your call to ogr2ogr is apparently generating a geo_shape field type while the GeoJSON upload tool in Kibana is using geo_point.

If you want ogr2ogr to map your geometries as geo_point you need to use the creation layer option GEOM_MAPPING_TYPE, as stated in the driver documentation:

  • GEOM_MAPPING_TYPE =AUTO/GEO_POINT/GEO_SHAPE. Mapping type for geometry fields. Defaults to AUTO. GEO_POINT uses the geo_point mapping type. If used, the “centroid” of the geometry is used. This is the behavior of GDAL < 2.1. GEO_SHAPE uses the geo_shape mapping type, compatible of all geometry types. When using AUTO, for geometry fields of type Point, a geo_point is used. In other cases, geo_shape is used.

On a 2020 advent calendar post I used that parameter for geo_shape so maybe you can take a look as inspiration.

1 Like

Hello Jorge!

Thank you for your answer :slight_smile:

I am sorry because I wasn't very precise on my question about what I tried and about the environment I am running.

Indeed, I was trying different parametters according the gdal driver documentation without any success: I was using GEOM_MAPPING_TYPE, GEOMETRY_NAME and others.

That's an example of the requests I tried:
ogr2ogr -progress -lco INDEX_NAME=p_accesos -lco OVERWRITE_INDEX=YES -lco GEOM_MAPPING_TYPE=GEO_POINT -f Elasticsearch http://nameserver/elasticsearch p_accesos.geojson

Or

ogr2ogr -progress -lco INDEX_NAME=p_accesos -lco OVERWRITE_INDEX=YES -lco GEOM_MAPPING_TYPE=GEO_POINT -lco GEOMETRY_NAME=geometry -f Elasticsearch http://nameserver/elasticsearch p_accesos.geojson

Currently, I am running GDAL 3.4.0 under Windows OS. Elasticsearch 7.15.

Maybe I am missing something on my requests?

Perhaps the environment where I installed GDAL libraries could be the cause the driver is not mapping well the GEOMETRY_NAME param I am passing?

I think I've replicated your issue since I was not able to generate a geo_point field in Elasticsearch from ogr2ogr.

Just to confirm the issue in ogr2ogr I did the following steps to upload a dataset and I was not able to perform the upload as geo_point but it worked fine as geo_shape. I'm using docker to ensure there are no particularities from my set up.

(the dollar sign $ indicates a call from the command line)

$ mapshaper -i ne_10m_airports.shp -o airports.geo.json
  • Export the URL to my ES instance
$ export ES_URL="http://user:password@host:port"
  • Check the URL is correct
$ curl -s "${ES_URL}"

{
  "name" : "master.elasticsearch",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "b2zx8N5SQF2RPV63DvbdXg",
  "version" : {
    "number" : "8.1.0-SNAPSHOT",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "20fbe125b9c9264e8da2e4690a2dde13bc1a26e0",
    "build_date" : "2022-02-23T00:17:56.526534263Z",
    "build_snapshot" : true,
    "lucene_version" : "9.0.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}
  • Generate the index mapping file selecting only a few interesting fields:
$ docker run --rm  \
  -v ${PWD}:/app \
  --network host \
  --user "$(id -u):$(id -g)" \
  osgeo/gdal:alpine-small-latest \
  ogr2ogr -overwrite -lco INDEX_NAME=airports \
    -select "scalerank,featurecla,type,name,abbrev,wikidataid" \
    -lco NOT_ANALYZED_FIELDS={ALL} \
    -lco WRITE_MAPPING=/app/airports.mapping.json \
    ES:${ES_URL} \
    /app/airports.geo.json
  • Format the mapping file for easier editing:
$ cat airports.mapping.json | jq > airports.mapping.pretty.json
  • Adjust the mappings file to use a text type in the name and a geo_shape for the geometry field:
{
  "properties": {
    "scalerank": { "type": "keyword" },
    "featurecla": { "type": "keyword" },
    "type": { "type": "keyword" },
    "name": { "type": "text" },
    "abbrev": { "type": "keyword" },
    "wikidataid": { "type": "keyword" },
    "geometry": { "type": "geo_shape" }
  }
}
  • While generating the mapping file an index was created, remove it:
$ curl -X DELETE "${ES_URL}/airports"
  • Upload the dataset:
$ docker run --rm  \
  -v ${PWD}:/app \
  --network host \
  osgeo/gdal:alpine-small-latest \
  ogr2ogr -progress -skipfailures\
    -select "scalerank,featurecla,type,name,abbrev,wikidataid" \
    -lco INDEX_NAME=airports \
    -lco MAPPING=/app/airports.mapping.pretty.json \
    ES:${ES_URL} \
    /app/airports.geo.json
  • Execute a sample geospatial search
$ curl -s -X POST "${ES_URL}/airports/_search?size=1" \
  -H 'Content-Type: application/json' \
  -d '{"query":{"geo_distance":{"distance":"15km","geometry":{"lat":40.4,"lon":-3.5}}}}' |\
  jq '.hits.hits[]'
{
  "_index": "airports",
  "_id": "rlstan8B89IY_3rvfWIm",
  "_score": 1,
  "_source": {
    "ogc_fid": 887,
    "geometry": {
      "type": "Point",
      "coordinates": [
        -3.5690266546,
        40.4681282734
      ]
    },
    "scalerank": 2,
    "featurecla": "Airport",
    "type": "major",
    "name": "Madrid Barajas",
    "abbrev": "MAD",
    "wikidataid": "Q166276"
  }
}

Hope this helps to clarify a successful workflow with ogr2ogr and Elasticsearch.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.