Index twitter data with coordinates geo_point parse exception

Bruno_Lavoie · November 27, 2017, 5:26pm

Hello,

I'm trying to index some twitter data without success because of the coordinates field that I want to use as geo_point.

Here is my template:

gist.github.com

https://gist.github.com/blavoie/b98c9fa3d279aab1ebc7fe8430c5dbb4

template.json

{  
  "template":"ul-twitter-*",
  "order":10,
  "settings":{  
    "number_of_shards":10,
    "index.mapping.total_fields.limit":2500
  },
  "mappings":{  
    "default":{  
      "properties":{

This file has been truncated. show original

Here is a document:

gist.github.com

https://gist.github.com/blavoie/168cc505954062c0920904bf44bb2c99

tweet.json

{
  "created_at" : "Fri Nov 24 20:00:08 +0000 2017",
  "id" : 934149657885802496,
  "id_str" : "934149657885802496",
  "text" : "\"TRUMP is my President\"! and the Temputure in Crab Orchard is: Temp: 59.3°F Wind:2.5mph Pressure: 28.73in Falling Rain Today 0.00in.",
  "source" : "<a href=\"http://sandaysoft.com/\" rel=\"nofollow\">Sandaysoft Cumulus</a>",
  "truncated" : false,
  "in_reply_to_status_id" : null,
  "in_reply_to_status_id_str" : null,
  "in_reply_to_user_id" : null,

This file has been truncated. show original

When I run steps to delete index, replace template and insert my document I'm getting an error:

{"error":{"root_cause":[{"type":"parse_exception","reason":"geo_point expected"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"parse_exception","reason":"geo_point expected"}},"status":400}

Steps:

HOSTNAME=localhost
curl -XDELETE http://$HOSTNAME:9200/ul-twitter-test
curl -XPUT http://$HOSTNAME:9200/_template/ul-twitter     -d @templates/ul-twitter.basic.json
curl -XPUT http://$HOSTNAME:9200/ul-twitter-test/tweets/1 -d @samples/tweet-with-geopoint.json

Currently running with version 5.6.4.

Any clues?

Thanks
Bruno

spinscale · November 28, 2017, 7:53am

Hey,

you are actually not indexing a geo_point (that consists only of a lon and lat pair), but a geo_shape, which in your example incidentally turns out to be a point, but could also be a line or a polygon.

In order to fix this, you could just use this in your tweet

coordinates: { lat: 11, lon: 12 }

and then everything should work

--Alex

Bruno_Lavoie · November 28, 2017, 2:37pm

Hello @spinscale,

Thanks for your response, but as per documentation geo_point type can be passed as an array:
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/geo-point.html

Snippet:

PUT my_index/my_type/4
{
  "text": "Geo-point as an array",
  "location": [ -71.34, 41.12 ] 
}

What am I missing?

spinscale · November 28, 2017, 2:50pm

nothing. Just because it works, does not mean it is intuitive to me

The person coming after you has to know which is the latitude and the longitude. And this person also needs to know, if you write a geo point as a string like41.12, -71.34, you need to reverse it to make it work. Because of that I am just always preferring to mark, which field represents what by naming it appropriately. Internally everyone of those three representations gets stored the same for, so it does not make a difference when searching.

personal preference, you may ignore

Bruno_Lavoie · November 28, 2017, 4:33pm

Yes, makes sense about personal preferences, but I meant that the tweet payload comes with this:

  "coordinates" : {
    "type" : "Point",
    "coordinates" : [ -84.51194444, 37.47361111 ]
  },

and my mapping template for coordinates.coordinates type is geo_point and it doesn't work... as this representation is normally correctly handled by elasticsearch (as per doc).

.... I'm really puzzled

Thanks

spinscale · November 28, 2017, 7:15pm

no, this representation only works for a geo shape, a point needs to be modelled like this

coordinates: [ -84.51194444, 37.47361111 ]

coordinates: "37.47361111, -84.51194444, "

or

coordinates: { lon: 37.47361111, lat: -84.51194444, }

the structure that you got resembles a geo shape because it contains the type and coordinates field inside of the coordinates field.

hope this makes sense

Bruno_Lavoie · November 28, 2017, 7:59pm

Sorry, not clear yet...

The field coordinates.coordinates is an array, and by the template this field is defined to be a geo_point, and per doc an array is a valid representation of geo_point...

is it because it must not be in an object? on the root?

Jim_Weng · November 28, 2017, 8:03pm

I had same problem liked yours.
And I fixed the problem by the script below.
Hope that works for you too.

github.com

jimweng/elastic-search/blob/master/template.json

{
  "mappings": {
    "_default_": {
      "properties": {
        "coordinates":{
          "properties": {
            "coordinates": {
              "type": "geo_point"
            },
            "type":{
              "type": "text",
              "fields": {
                "keyword":{
                  "type":"keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },

This file has been truncated. show original

Bruno_Lavoie · November 29, 2017, 2:07pm

Thanks @Jim_Weng you are right!

Still don't get how is Elasticsearch is doing things between your template and mine: very simple details! All this in a curly brace party.

You saved me hours.

system · December 27, 2017, 2:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problems with Geo Mapping Elasticsearch	3	355	July 6, 2017
Geo point as string Elasticsearch	8	3394	July 5, 2017
Dd Geo Point From Twitter geo Field Logstash	3	331	February 9, 2020
Geo data : MapperParser Exception Elasticsearch	1	439	February 12, 2018
Faield to parse geo_point Elasticsearch	6	2909	January 21, 2020

Index twitter data with coordinates geo_point parse exception

Related topics