Is it possible to write to ES from a json file in HDFS where JSON file has inconsistent or different keys in different records


(siva mannem) #1

Hi,

my json file is like this
+++++++++++++++++++
{"k1":"v1" , "k2":"v2" , "k3":"v3" , "k4":"v4" , "k5":"v5"}

{"k12":"v11" , "k23":"v22" , "k34":"v33" , "k45":"v44" ,
"k56":"v55"}

{"k1":"v111" , "k2":"v222" , "k3":"v333" , "k4":"v444" , "k5":"v555"}

{"k123":"v12" , "k234":"v23" , "k345":"v34" , "k456":"v45" ,
"k567":"v56"}
+++++++++++++++++++++

my pig script is like this
+++++++++++++++++++++++++++
REGISTER /usr/lib/gphd/pig/elasticsearch-hadoop-1.3.0.M2-yarn.jar;

DEFINE ESTOR org.elasticsearch.hadoop.pig.EsStorage('es.nodes=gateway1 ,
es.resource=ca/sf');

A = LOAD '/elastic_search/in_dir/' using
JsonLoader('k1:chararray,k2:chararray,k3:chararray,k4:chararray,k5:chararray');

B = FOREACH A GENERATE k1, k3, k5;
+++++++++++++++++++++++++++++

I am expecting a output like this
+++++++++++++++
(v1,v3,v5)
(v111,v333,v555)
++++++++++++++++++

but i am getting a output like this
++++++++++++
(v1,v3,v5)
(v11,v33,v55)
(v111,v333,v555)
++++++++++++++

is there any way to ignore the second record as there are no keys K1, K3
and k5 in second record?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/410f55ac-7e43-4789-83dc-eb4958fa2d55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(siva mannem) #2

sorry.
i am getting a output like this
++++++++++++
(v1,v3,v5)
(v11,v33,v55)
(v111,v333,v555)
(v12, v34, v56)
++++++++++++++

On Tuesday, April 1, 2014 11:14:40 AM UTC-7, siva mannem wrote:

Hi,

my json file is like this
+++++++++++++++++++
{"k1":"v1" , "k2":"v2" , "k3":"v3" , "k4":"v4" ,
"k5":"v5"}

{"k12":"v11" , "k23":"v22" , "k34":"v33" , "k45":"v44" ,
"k56":"v55"}

{"k1":"v111" , "k2":"v222" , "k3":"v333" , "k4":"v444" , "k5":"v555"}

{"k123":"v12" , "k234":"v23" , "k345":"v34" , "k456":"v45" ,
"k567":"v56"}
+++++++++++++++++++++

my pig script is like this
+++++++++++++++++++++++++++
REGISTER /usr/lib/gphd/pig/elasticsearch-hadoop-1.3.0.M2-yarn.jar;

DEFINE ESTOR org.elasticsearch.hadoop.pig.EsStorage('es.nodes=gateway1 ,
es.resource=ca/sf');

A = LOAD '/elastic_search/in_dir/' using
JsonLoader('k1:chararray,k2:chararray,k3:chararray,k4:chararray,k5:chararray');

B = FOREACH A GENERATE k1, k3, k5;
+++++++++++++++++++++++++++++

I am expecting a output like this
+++++++++++++++
(v1,v3,v5)
(v111,v333,v555)
++++++++++++++++++

but i am getting a output like this
++++++++++++
(v1,v3,v5)
(v11,v33,v55)
(v111,v333,v555)
++++++++++++++

is there any way to ignore the second record as there are no keys K1, K3
and k5 in second record?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51fceb7f-fc25-410b-911c-f283f8ecf5e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3