Store geoshape withing Pig using ESHadoop


(Roey) #1

I'm trying to store geoshape (like the following) to ES via pig using org.elasticsearch.hadoop.pig.EsStorage (2.2.0) :

    {
        "location" : {
            "type" : "circle",
            "coordinates" : [-45.0, 45.0],
            "radius" : "100m"
        }
    }

or :

    {
        "location" : {
            "type" : "polygon",
            "orientation" : "clockwise",
            "coordinates" : [
                [ [-177.0, 10.0], [176.0, 15.0], [172.0, 0.0], [176.0, -15.0], [-177.0, -10.0], [-177.0, 10.0] ],
                [ [178.2, 8.2], [-178.8, 8.2], [-180.8, -8.8], [178.2, 8.8] ]
            ]
        }
    }

We tried the following:

REGISTER ./elasticsearch-hadoop-2.2.0.jar;

loadedRecords = LOAD 'inputFile.csv' USING PigStorage('|') AS (type:chararray,coordinates:bag{(float,float)},radius:chararray);

elasticData = foreach loadedRecords GENERATE (type ,{(45.0f,46.0f)},radius) AS geoArea:tuple(type:chararray,coordinates:bag{(float,float)},radius:chararray);

DESCRIBE elasticData ;

DUMP elasticData;

STORE elasticData INTO 'myindex/mytype' USING org.elasticsearch.hadoop.pig.EsStorage('es.http.retries=10','es.nodes=localhost','es.index.auto.create=true','es.mapping.pig.tuple.use.field.names=false');

and receiving an error while parsing the coordinates it encountered a non numeric value and failed.
(type was parsed to CIRCLE)

We tried also the following:

I tried another thing but this was problematic as well:

REGISTER ./elasticsearch-hadoop-2.2.0.jar;

loadedRecords = LOAD 'inputFile.csv' USING PigStorage('|') AS (type:chararray,coordinates:chararray,radius:chararray);

--elasticData = foreach loadedRecords GENERATE (type ,{(45.0f,46.0f)} ,radius) AS geo:tuple(type:chararray,coordinates:bag{(float,float)},radius:chararray;
elasticData = foreach loadedRecords GENERATE TOMAP('type','circle','coordinates','[40.0f,46.0f]','radius','150m') AS geo:map[chararray];
DESCRIBE elasticData ;

DUMP elasticData;

STORE elasticData INTO 'myindex/mytype' USING org.elasticsearch.hadoop.pig.EsStorage('es.http.retries=10','es.nodes=host','es.index.auto.create=true','es.mapping.pig.tuple.use.field.names=false');

received:

Caused by: com.fasterxml.jackson.core.JsonParseException: Current token (END_OBJECT) not numeric, can not use numeric value accessors
 at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput@20063f76; line: 1, column: 83]
	at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1581)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:533)
	at com.fasterxml.jackson.core.base.ParserBase._parseNumericValue(ParserBase.java:799)
	at com.fasterxml.jackson.core.base.ParserBase.getDoubleValue(ParserBase.java:713)
	at org.elasticsearch.common.xcontent.json.JsonXContentParser.doDoubleValue(JsonXContentParser.java:180)
	at org.elasticsearch.common.xcontent.support.AbstractXContentParser.doubleValue(AbstractXContentParser.java:184)
	at org.elasticsearch.common.xcontent.support.AbstractXContentParser.doubleValue(AbstractXContentParser.java:174)
	at org.elasticsearch.common.geo.builders.ShapeBuilder.parseCoordinates(ShapeBuilder.java:248)
	at org.elasticsearch.common.geo.builders.ShapeBuilder.access$100(ShapeBuilder.java:46)
	at org.elasticsearch.common.geo.builders.ShapeBuilder$GeoShapeType.parse(ShapeBuilder.java:744)
	at org.elasticsearch.common.geo.builders.ShapeBuilder.parse(ShapeBuilder.java:291)

Did someone store geoshape to ES using pig and can help us?Or can advice us how to do that in another way?

Please see comments of the post in Stackoverflow to get more detailed information:

Thanks a lot,
Roey.


(Costin Leau) #2

Hi,

I've read the SO thread and the discussed stopped around the generated HTTP reply generated, namely that the coordinates are handled as a string instead as an array of floats.
To quote json:

  ([type#circle,coordinates#[40.0f,46.0f],radius#150m]).   for the dump 
result I showed, ES received as an input (I debugged and found that) the
 following line:  
[{"Geo":{"radius":"150m","type":"circle","coordinates":"[40.0f,46.0f]"}}]}
  Line that works: {     "location" : {         "type" : "circle",      
   "coordinates" : [40.0, 46.0],         "radius" : "150m"     } }

Can you confirm the last pig script that you have in place? Currently the code expects the map to be consistent (the keys are all strings and the values are of the same type) which might be a wrong assumption (it depends on whether the Pig Schema information provides additional information).

Cheers,


(Costin Leau) #3

By the way, I've raised this issue.

Cheers,


(Costin Leau) #4

Based on the docs and some tests, it looks like indeed a map is expected to have its values of the same type. However only if the type is declared (which is weird - what happens if the type is not declared):

(Optional) The datatype (all types allowed, bytearray is the default).The type applies to the map value only; the map key is always type chararray (see Map).If a type is declared then ALL values in the map must be of this type.

This corner-case causes the map values to be handled as having the same type. I've fixed this in master and backported to 2.x.
A new snapshot for both should be up in 10' or so - please try it out and report back.


(system) #5