I've fixed this in master and pushed a snapshot with the fix [1]. Let me know how it works for you.
Costin,
integer data type doesn't work at all. I've added logs to BufferedRestClient addtoIndex method. The first log is the
writable object. toString and the second log is the mapper write value as string. Integer data type used to be fine
before this commit though.
2013-04-30 09:57:11,466 INFO org.elasticsearch.hadoop.rest.BufferedRestClient: Writable{rid=[B@1a3650ed, rdata={[B@4e0a2a38=8, [B@7d59ea8e=9}, rdate=1234, mapids=[[B@63fb050c, [B@75088a1b, [B@3a32ea4]}
2013-04-30 09:57:11,536 INFO org.elasticsearch.hadoop.rest.BufferedRestClient: ES index query{"index":{}}
{"rid":"AAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","rdata":{"[B@4e0a2a38":"8","[B@7d59ea8e":"9"},"rdate":"1234","mapids":["AAAAAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","AAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","AAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="]}
Please let me know if you need more information I can add logs.
Thanks,
Abhishek
On Monday, April 29, 2013 5:01:03 PM UTC-6, Abhishek Andhavarapu wrote:
Costin, Thanks. Its work great. Only problem I see is if the map key/value and array data type is int. I see random
values in ES. Works great with Strings. I know I can force the mapping on the ES side to be int but just wondering
if its a simple fix.
On Monday, April 29, 2013 11:35:12 AM UTC-6, Costin Leau wrote:
The issue has been fixed in master.
Cheers!
On Thursday, April 25, 2013 7:22:57 PM UTC+3, Abhishek Andhavarapu wrote:
Thanks Costin.
On Thu, Apr 25, 2013 at 10:20 AM, Costin Leau <costi...@gmail.com> wrote:
Looks like an error in ESSerDe for which I've raised an issue:
https://github.com/elasticsearch/elasticsearch-hadoop/issues/39
<https://github.com/elasticsearch/elasticsearch-hadoop/issues/39>
On Wednesday, April 24, 2013 6:25:35 PM UTC+2, Abhishek Andhavarapu wrote:
Thanks Costin for the reply. Here is the error.
2013-04-24 10:15:50,990 INFO org.apache.hadoop.hive.ql.__exec.MapOperator: Adding alias maptest3 to work list for file hdfs://hadoop1.local:8020/__user/hive/warehouse/maptest3
2013-04-24 10:15:50,996 INFO org.apache.hadoop.hive.ql.__exec.MapOperator: dump TS struct<rid:int,mapids:array<__int>,rdate:string,rdata:map<__int,string>>
2013-04-24 10:15:50,997 INFO ExecMapper:
<MAP>Id =3
<Children>
<TS>Id =0
<Children>
<SEL>Id =1
<Children>
<FS>Id =2
<Parent>Id = 1 null<\Parent>
<\FS>
<\Children>
<Parent>Id = 0 null<\Parent>
<\SEL>
<\Children>
<Parent>Id = 3 null<\Parent>
<\TS>
<\Children>
<\MAP>
2013-04-24 10:15:50,997 INFO org.apache.hadoop.hive.ql.__exec.MapOperator: Initializing Self 3 MAP
2013-04-24 10:15:50,997 INFO org.apache.hadoop.hive.ql.__exec.TableScanOperator: Initializing Self 0 TS
2013-04-24 10:15:50,997 INFO org.apache.hadoop.hive.ql.__exec.TableScanOperator: Operator 0 TS initialized
2013-04-24 10:15:50,997 INFO org.apache.hadoop.hive.ql.__exec.TableScanOperator: Initializing children of 0 TS
2013-04-24 10:15:50,997 INFO org.apache.hadoop.hive.ql.__exec.SelectOperator: Initializing child 1 SEL
2013-04-24 10:15:50,998 INFO org.apache.hadoop.hive.ql.__exec.SelectOperator: Initializing Self 1 SEL
2013-04-24 10:15:51,008 INFO org.apache.hadoop.hive.ql.__exec.SelectOperator: SELECT struct<rid:int,mapids:array<__int>,rdate:string,rdata:map<__int,string>>
2013-04-24 10:15:51,012 INFO org.apache.hadoop.hive.ql.__exec.SelectOperator: Operator 1 SEL initialized
2013-04-24 10:15:51,012 INFO org.apache.hadoop.hive.ql.__exec.SelectOperator: Initializing children of 1 SEL
2013-04-24 10:15:51,012 INFO org.apache.hadoop.hive.ql.__exec.FileSinkOperator: Initializing child 2 FS
2013-04-24 10:15:51,012 INFO org.apache.hadoop.hive.ql.__exec.FileSinkOperator: Initializing Self 2 FS
2013-04-24 10:15:51,031 INFO org.apache.hadoop.hive.ql.__exec.FileSinkOperator: Operator 2 FS initialized
2013-04-24 10:15:51,031 INFO org.apache.hadoop.hive.ql.__exec.FileSinkOperator: Initialization Done 2 FS
2013-04-24 10:15:51,031 INFO org.apache.hadoop.hive.ql.__exec.SelectOperator: Initialization Done 1 SEL
2013-04-24 10:15:51,031 INFO org.apache.hadoop.hive.ql.__exec.TableScanOperator: Initialization Done 0 TS
2013-04-24 10:15:51,031 INFO org.apache.hadoop.hive.ql.__exec.MapOperator: Initialization Done 3 MAP
2013-04-24 10:15:51,039 INFO org.apache.hadoop.hive.ql.__exec.MapOperator: Processing alias maptest3 for file hdfs://hadoop1.allegiance.__local:8020/user/hive/__warehouse/maptest3
2013-04-24 10:15:51,040 INFO org.apache.hadoop.hive.ql.__exec.MapOperator: 3 forwarding 1 rows
2013-04-24 10:15:51,040 INFO org.apache.hadoop.hive.ql.__exec.TableScanOperator: 0 forwarding 1 rows
2013-04-24 10:15:51,043 INFO org.apache.hadoop.hive.ql.__exec.SelectOperator: 1 forwarding 1 rows
2013-04-24 10:15:51,043 INFO org.apache.hadoop.hive.ql.__exec.FileSinkOperator: New Final Path: FS /user/hive/warehouse/_tmp.__maptest1/000000_3
2013-04-24 10:15:51,422 FATAL ExecMapper: org.apache.hadoop.hive.ql.__metadata.HiveException: Hive Runtime Error while processing row {"rid":1,"mapids":[2,3,4],"__rdate":"1234","rdata":{5:"8",__6:"9"}}
at org.apache.hadoop.hive.ql.__exec.MapOperator.process(__MapOperator.java:565)
at org.apache.hadoop.hive.ql.__exec.ExecMapper.map(__ExecMapper.java:143)
at org.apache.hadoop.mapred.__MapRunner.run(MapRunner.java:__50)
at org.apache.hadoop.mapred.__MapTask.runOldMapper(MapTask.__java:418)
at org.apache.hadoop.mapred.__MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.__Child$4.run(Child.java:268)
at java.security.__AccessController.doPrivileged(__Native Method)
at javax.security.auth.Subject.__doAs(Subject.java:396)
at org.apache.hadoop.security.__UserGroupInformation.doAs(__UserGroupInformation.java:__1408)
at org.apache.hadoop.mapred.__Child.main(Child.java:262)
Caused by: java.lang.ArrayStoreException
at java.lang.System.arraycopy(__Native Method)
at java.util.ArrayList.toArray(__ArrayList.java:306)
at org.elasticsearch.hadoop.hive.__ESSerDe.hiveToWritable(__ESSerDe.java:136)
at org.elasticsearch.hadoop.hive.__ESSerDe.hiveToWritable(__ESSerDe.java:197)
at org.elasticsearch.hadoop.hive.__ESSerDe.serialize(ESSerDe.__java:109)
at org.apache.hadoop.hive.ql.__exec.FileSinkOperator.__processOp(FileSinkOperator.__java:586)
at org.apache.hadoop.hive.ql.__exec.Operator.process(__Operator.java:474)
at org.apache.hadoop.hive.ql.__exec.Operator.forward(__Operator.java:800)
at org.apache.hadoop.hive.ql.__exec.SelectOperator.processOp(__SelectOperator.java:84)
at org.apache.hadoop.hive.ql.__exec.Operator.process(__Operator.java:474)
at org.apache.hadoop.hive.ql.__exec.Operator.forward(__Operator.java:800)
at org.apache.hadoop.hive.ql.__exec.TableScanOperator.__processOp(TableScanOperator.__java:83)
at org.apache.hadoop.hive.ql.__exec.Operator.process(__Operator.java:474)
at org.apache.hadoop.hive.ql.__exec.Operator.forward(__Operator.java:800)
at org.apache.hadoop.hive.ql.__exec.MapOperator.process(__MapOperator.java:546)
... 9 more
2013-04-24 10:15:51,422 INFO org.apache.hadoop.hive.ql.__exec.MapOperator: 3 finished. closing...
2013-04-24 10:15:51,422 INFO org.apache.hadoop.hive.ql.__exec.MapOperator: 3 forwarded 1 rows
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.MapOperator: DESERIALIZE_ERRORS:0
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.TableScanOperator: 0 finished. closing...
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.TableScanOperator: 0 forwarded 1 rows
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.SelectOperator: 1 finished. closing...
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.SelectOperator: 1 forwarded 1 rows
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.FileSinkOperator: 2 finished. closing...
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.FileSinkOperator: 2 forwarded 0 rows
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:0
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.SelectOperator: 1 Close done
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.TableScanOperator: 0 Close done
2013-04-24 10:15:51,423 INFO org.apache.hadoop.hive.ql.__exec.MapOperator: 3 Close done
2013-04-24 10:15:51,423 INFO ExecMapper: ExecMapper: processed 0 rows: used memory = 23614288
2013-04-24 10:15:51,435 INFO org.apache.hadoop.mapred.__TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-04-24 10:15:51,439 WARN org.apache.hadoop.mapred.__Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.__metadata.HiveException: Hive Runtime Error while processing row {"rid":1,"mapids":[2,3,4],"__rdate":"1234","rdata":{5:"8",__6:"9"}}
at org.apache.hadoop.hive.ql.__exec.ExecMapper.map(__ExecMapper.java:161)
at org.apache.hadoop.mapred.__MapRunner.run(MapRunner.java:__50)
at org.apache.hadoop.mapred.__MapTask.runOldMapper(MapTask.__java:418)
at org.apache.hadoop.mapred.__MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.__Child$4.run(Child.java:268)
at java.security.__AccessController.doPrivileged(__Native Method)
at javax.security.auth.Subject.__doAs(Subject.java:396)
at org.apache.hadoop.security.__UserGroupInformation.doAs(__UserGroupInformation.java:__1408)
at org.apache.hadoop.mapred.__Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.__metadata.HiveException: Hive Runtime Error while processing row {"rid":1,"mapids":[2,3,4],"__rdate":"1234","rdata":{5:"8",__6:"9"}}
at org.apache.hadoop.hive.ql.__exec.MapOperator.process(__MapOperator.java:565)
at org.apache.hadoop.hive.ql.__exec.ExecMapper.map(__ExecMapper.java:143)
... 8 more
Caused by: java.lang.ArrayStoreException
at java.lang.System.arraycopy(__Native Method)
at java.util.ArrayList.toArray(__ArrayList.java:306)
at org.elasticsearch.hadoop.hive.__ESSerDe.hiveToWritable(__ESSerDe.java:136)
at org.elasticsearch.hadoop.hive.__ESSerDe.hiveToWritable(__ESSerDe.java:197)
at org.elasticsearch.hadoop.hive.__ESSerDe.serialize(ESSerDe.__java:109)
at org.apache.hadoop.hive.ql.__exec.FileSinkOperator.__processOp(FileSinkOperator.__java:586)
at org.apache.hadoop.hive.ql.__exec.Operator.process(__Operator.java:474)
at org.apache.hadoop.hive.ql.__exec.Operator.forward(__Operator.java:800)
at org.apache.hadoop.hive.ql.__exec.SelectOperator.processOp(__SelectOperator.java:84)
at org.apache.hadoop.hive.ql.__exec.Operator.process(__Operator.java:474)
at org.apache.hadoop.hive.ql.__exec.Operator.forward(__Operator.java:800)
at org.apache.hadoop.hive.ql.__exec.TableScanOperator.__processOp(TableScanOperator.__java:83)
at org.apache.hadoop.hive.ql.__exec.Operator.process(__Operator.java:474)
at org.apache.hadoop.hive.ql.__exec.Operator.forward(__Operator.java:800)
at org.apache.hadoop.hive.ql.__exec.MapOperator.process(__MapOperator.java:546)
... 9 more
2013-04-24 10:15:51,446 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
Thanks,
On Wednesday, April 24, 2013 12:44:03 AM UTC-6, Costin Leau wrote:
Hi,
1) What's the problem? Any error message that you receive? Except for UNIONs, Arrays (or List)
as well as Map should work.
2) ES-Hadoop integration sits outside ES. It just something added to the Hadoop env to talk to
Hadoop and the reason for that is to take advantage of the map/reduce capabilities which map
nicely on top of ES.
A river or a single-instance process would render the parallel capabilities of Hadoop void.
3) Hive doesn't support any UPDATE statement - just INSERT and INSERT OVERWRITE which doesn't
really apply here since it's an external table. We might extend INSERT OVERWRITE semantics but
that is tricky since it requires the notion of ID - typically insert overwrite is the equivalent
of dropping a table and then adding data into it, which is clearly not an update.
You are better off handling the UPDATE directly in ES.
Note that in Hive (as with the rest of the map/reduce frameworks) data is not updated, but
rather copied and transformed.
Cheers,
On Tuesday, April 23, 2013 11:25:37 PM UTC+2, Abhishek Andhavarapu wrote:
Hi All,
I'm trying to push data from hive to elastic search using external tables (
https://github.com/__elasticsearch/elasticsearch-__hadoop
<https://github.com/elasticsearch/elasticsearch-hadoop> )
My ES index mapping
{
"rid": 1,
"mapids" : [2,3,4], //Array
"data": [ //Nested objects
{
"mapid": "5",
"value": "g1"
},
{
"mapid": "6",
"value": "g2"
}
]
}
My Hive table structure
CREATE EXTERNAL TABLE maptest_ex(
rid INT,
mapids ARRAY<INT>,
rdata MAP<INT,STRING>)
STORED BY 'org.elasticsearch.hadoop.__hive.ESStorageHandler'
TBLPROPERTIES(
'es.host' = 'elasticsearch1',
'es.resource' = 'radio/artists/')
and I'm trying to push data from local hive table to the external table
insert into table maptest_ex
select rid,mapids,rdata from maptest3
1) The push works for simple data type like int and string but not arrays and maps. How do I
push data from Hive to ES.
2) Is a Hive river I could use ?
3) How do I update the document in es? (If a row already exists can es storage handler
delete the existing es document and insert the new/ updated doc)
Any help is appreciated,
Thanks
--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/BAaoqF6SkiY/unsubscribe?hl=en-US
<https://groups.google.com/d/topic/elasticsearch/BAaoqF6SkiY/unsubscribe?hl=en-US>.
To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.