Writing rich datatypes from Pig to ElasticSearch type with mapping (Geoshape)

Hello,
I am trying to write a Geoshape from Pig to ElasticSearch using EsStorage.
(The Index has a mapping for the geoshape field)

I am able to write a document with that field identified as a "GeoShape" using "curl", however from Pig it fails on:
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [...] returned Bad Request(400) - failed to parse [affectedGeoArea];shape must be an object consisting of type and coordinates; Bailing out..

Also a Pig chararray with a JSON contents didn't do the trick (as in GeoPoint), so it seems that I need to create a Tuple and describe: type,coordinates and radius
I read the excellent documentation page at https://www.elastic.co/guide/en/elasticsearch/hadoop/current/pig.html, however I could not find a code example.

  1. Sample input record from the input file looks like that:
    circle|[-45.0,45.0]

  2. Code sample: (which fails with the message mentioned here)

REGISTER ./elasticsearch-hadoop-2.2.0.jar;

loadedRecords = LOAD 'inputFile.csv' USING PigStorage('|') AS (type:chararray,coordinates:chararray);

elasticData = foreach loadedRecords GENERATE (type,coordinates) AS affectedGeoArea:tuple(type:chararray,coordinates:chararray);

DESCRIBE elasticData ;

DUMP elasticData;

STORE elasticData INTO 'myindex/mytype' USING org.elasticsearch.hadoop.pig.EsStorage('es.http.retries=10','es.nodes=localhost','es.index.auto.create=true','es.mapping.pig.tuple.use.field.names=false');

Appreciate if you can share a Pig code example or even better add it to the documentation.

Thanks

1 Like
  1. consider using the latest stable version of ES-Hadoop, namely 2.3.x
  2. enable logging (also in the docs) to see the REST HTTP queries generated. This is a great way to map things back to ES.
  3. when using tuples (as you are), disable the use of field names (which you are)

Hope this helps,

Thanks Costin,
We debugged this both on the es-hadoop (client) and ES server side, still we didn't find a suitable datatype in Pig to represent a GeoShape (I can explain some more if someone is interested)

Anyway we changed our Pig script to build a single JSON string from all the relevant input fields (using a UDF), then we added the EsStorage property: 'es.input.json=true'

STORE elasticData INTO 'myindex/mytype' USING org.elasticsearch.hadoop.pig.EsStorage('es.http.retries=10','es.nodes=localhost','es.index.auto.create=true','es.mapping.pig.tuple.use.field.names=false','es.input.json=true');

That did the trick and the document was properly saved containing a GeoShape field :slight_smile:

By the way, have you seen this thread - looks to be directly related to the same issue.