Writing rich datatypes from Pig to ElasticSearch type with mapping (Geoshape)

(Liadl) #1

I am trying to write a Geoshape from Pig to ElasticSearch using EsStorage.
(The Index has a mapping for the geoshape field)

I am able to write a document with that field identified as a "GeoShape" using "curl", however from Pig it fails on:
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [...] returned Bad Request(400) - failed to parse [affectedGeoArea];shape must be an object consisting of type and coordinates; Bailing out..

Also a Pig chararray with a JSON contents didn't do the trick (as in GeoPoint), so it seems that I need to create a Tuple and describe: type,coordinates and radius
I read the excellent documentation page at https://www.elastic.co/guide/en/elasticsearch/hadoop/current/pig.html, however I could not find a code example.

  1. Sample input record from the input file looks like that:

  2. Code sample: (which fails with the message mentioned here)

REGISTER ./elasticsearch-hadoop-2.2.0.jar;

loadedRecords = LOAD 'inputFile.csv' USING PigStorage('|') AS (type:chararray,coordinates:chararray);

elasticData = foreach loadedRecords GENERATE (type,coordinates) AS affectedGeoArea:tuple(type:chararray,coordinates:chararray);

DESCRIBE elasticData ;

DUMP elasticData;

STORE elasticData INTO 'myindex/mytype' USING org.elasticsearch.hadoop.pig.EsStorage('es.http.retries=10','es.nodes=localhost','es.index.auto.create=true','es.mapping.pig.tuple.use.field.names=false');

Appreciate if you can share a Pig code example or even better add it to the documentation.


(Costin Leau) #2
  1. consider using the latest stable version of ES-Hadoop, namely 2.3.x
  2. enable logging (also in the docs) to see the REST HTTP queries generated. This is a great way to map things back to ES.
  3. when using tuples (as you are), disable the use of field names (which you are)

Hope this helps,

(Liadl) #3

Thanks Costin,
We debugged this both on the es-hadoop (client) and ES server side, still we didn't find a suitable datatype in Pig to represent a GeoShape (I can explain some more if someone is interested)

Anyway we changed our Pig script to build a single JSON string from all the relevant input fields (using a UDF), then we added the EsStorage property: 'es.input.json=true'

STORE elasticData INTO 'myindex/mytype' USING org.elasticsearch.hadoop.pig.EsStorage('es.http.retries=10','es.nodes=localhost','es.index.auto.create=true','es.mapping.pig.tuple.use.field.names=false','es.input.json=true');

That did the trick and the document was properly saved containing a GeoShape field :slight_smile:

(Costin Leau) #4

By the way, have you seen this thread - looks to be directly related to the same issue.

(system) #5