Folks,
I am using elasticsearch-hadoop-hive-2.1.0.Beta3.jar
I defined the external table as:.
CREATE EXTERNAL TABLE IF NOT EXISTS ${staging_table}(
customer_id STRING,
store_purchase array<map<string,string>>)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.nodes'='localhost:9200',
'es.resource'='user_activity/store',
'es.mapping.id'='customer_id',
'es.input.json'='false',
'es.write.operation'='upsert',
'es.update.script'='ctx._source.store_purchase += purchase',
'es.update.script.params'='purchase:store_purchase'
) ;"
I create another source table with the same column names and put some
sample data.
Running INSERT OVERWRITE TABLE ${staging_table}
SELECT customer_id, store_purchase) FROM ${test_table}
but it throws EsHadoopIllegalArgumentException: Field [_col1] needs to be a
primitive; found [array>]. Is array> supported yet? If not, how can I get
around this issue?
The exception occurs because you are trying to extract a field (the script
parameters) from a complex type (array) and not a primitive. The issue with
that (and why it's currently not supported) is because the internal
structure of the complex type can get quite complex and its serialized,
JSON form incorrect.
Any reason why you need to pass the array of maps as a script parameter and
not use primitives instead (you can use Hive column mapping to extract the
ones you need)?
Folks,
I am using elasticsearch-hadoop-hive-2.1.0.Beta3.jar
I defined the external table as:.
CREATE EXTERNAL TABLE IF NOT EXISTS ${staging_table}(
customer_id STRING,
store_purchase array<map<string,string>>)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.nodes'='localhost:9200',
'es.resource'='user_activity/store',
'es.mapping.id'='customer_id',
'es.input.json'='false',
'es.write.operation'='upsert',
'es.update.script'='ctx._source.store_purchase += purchase',
'es.update.script.params'='purchase:store_purchase'
) ;"
I create another source table with the same column names and put some
sample data.
Running INSERT OVERWRITE TABLE ${staging_table}
SELECT customer_id, store_purchase) FROM ${test_table}
but it throws EsHadoopIllegalArgumentException: Field [_col1] needs to be
a primitive; found [array>]. Is array> supported yet? If not, how can I get
around this issue?
Costin,
Thanks for your info.
I am mapping array of maps to nested objects in ES, and in this specific
case, the expected document in ES will look like
{
_id:customer_id,
store_purchase:[{item_id:123, category:'pants', department:'clothes'},
...]
}
so that I can do query like find all users between T1 and T2 ,have
purchased items whose department is A and category is B.
Anyway of achieving this with es-hadoop?
Chen
On Thursday, March 12, 2015 at 9:18:14 PM UTC-7, Costin Leau wrote:
The exception occurs because you are trying to extract a field (the script
parameters) from a complex type (array) and not a primitive. The issue with
that (and why it's currently not supported) is because the internal
structure of the complex type can get quite complex and its serialized,
JSON form incorrect.
Any reason why you need to pass the array of maps as a script parameter
and not use primitives instead (you can use Hive column mapping to extract
the ones you need)?
On Thu, Mar 12, 2015 at 11:56 PM, Chen Wang <chen.apa...@gmail.com
<javascript:>> wrote:
Folks,
I am using elasticsearch-hadoop-hive-2.1.0.Beta3.jar
I defined the external table as:.
CREATE EXTERNAL TABLE IF NOT EXISTS ${staging_table}(
customer_id STRING,
store_purchase array<map<string,string>>)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.nodes'='localhost:9200',
'es.resource'='user_activity/store',
'es.mapping.id'='customer_id',
'es.input.json'='false',
'es.write.operation'='upsert',
'es.update.script'='ctx._source.store_purchase += purchase',
'es.update.script.params'='purchase:store_purchase'
) ;"
I create another source table with the same column names and put some
sample data.
Running INSERT OVERWRITE TABLE ${staging_table}
SELECT customer_id, store_purchase) FROM ${test_table}
but it throws EsHadoopIllegalArgumentException: Field [_col1] needs to be
a primitive; found [array>]. Is array> supported yet? If not, how can I get
around this issue?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.