I am trying to connect Elasticsearch through Hive to utilize for its quick full-text searches primarily.
I have created the table
CREATE EXTERNAL TABLE scorecard (
item_id BIGINT,
description STRING,
description_mod1 STRING,
price_mod1 DOUBLE,
price_mod2 DOUBLE
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES( 'es.resource' = 'fulltextsearch/scorecard',
'es.nodes'='elasticsearch',
'es.net.http.auth.user'='user',
'es.net.http.auth.pass'='pass'
);
For reference, my index mapping was created with the following JSON
{
"mappings": {
"scorecard": {
"properties": {
"item_id": {
"type": "long"
},
"description": {
"type": "text"
},
"description_mod1": {
"type": "text"
},
"price_mod1": {
"type": "double"
},
"price_mod2": {
"type": "double"
}
}
}
}
}
I have successfully loaded about a million rows into the index. My issue comes with trying to selectively query rows. For something such as select * from scorecard limit 5
the results are returned immediately as expected. However when attempting to do even a simple where query such as select * from scorecard where item_id ='123'
or select * from scorecard where description = 'exact text'
appear in the hive debug log to be pulling in the full data set into Hive first rather than the expected transcribing of query parameters to es-hadoop.
Am I missing something from the configuration to force the predicates to be transcribed?